Week Ending 9.1.2024

RESEARCH WATCH: 9.1.2024

Modularity in Transformers: Investigating Neuron Separability & Specialization

This paper investigates the internal workings of transformer models, focusing on understanding the modularity and task specialization of neurons within these architectures. The findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency.

Authors: Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman

Link: https://arxiv.org/abs/2408.17324v1

Date: 2024-08-30

Summary:

Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets. Our findings reveal evidence of task-specific neuron clusters, with varying degrees of overlap between related tasks. We observe that neuron importance patterns persist to some extent even in randomly initialized models, suggesting an inherent structure that training refines. Additionally, we find that neuron clusters identified through MoEfication correspond more strongly to task-specific neurons in earlier and later layers of the models. This work contributes to a more nuanced understanding of transformer internals and offers insights into potential avenues for improving model interpretability and efficiency.

--------------------------------------------------------------------------------------------------------

Stationary Policies are Optimal in Risk-averse Total-reward MDPs with EVaR

This paper shows that the risk-averse total reward criterion, under the Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) risk measures, can be optimized by a stationary policy. This simplifies the analysis, interpretation, and deployment of these risk-averse reinforcement learning models, which have applications in a broad range of domains.

Authors: Xihong Su, Marek Petrik, Julien Grand-Clément

Link: https://arxiv.org/abs/2408.17286v1

Date: 2024-08-30

Summary:

Optimizing risk-averse objectives in discounted MDPs is challenging because most models do not admit direct dynamic programming equations and require complex history-dependent policies. In this paper, we show that the risk-averse {\em total reward criterion}, under the Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) risk measures, can be optimized by a stationary policy, making it simple to analyze, interpret, and deploy. We propose exponential value iteration, policy iteration, and linear programming to compute optimal policies. In comparison with prior work, our results only require the relatively mild condition of transient MDPs and allow for {\em both} positive and negative rewards. Our results indicate that the total reward criterion may be preferable to the discounted criterion in a broad range of risk-averse reinforcement learning domains.

--------------------------------------------------------------------------------------------------------

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation

This work addresses the challenge of modality heterogeneity in federated learning, where the audio modality often underperforms compared to the visual modality. ModalityMirror uses knowledge distillation to improve the audio model performance by leveraging the stronger audiovisual federated learning model, unlocking the potential of exploiting diverse modality spectrums in multi-modal federated learning.

Authors: Tiantian Feng, Tuo Zhang, Salman Avestimehr, Shrikanth S. Narayanan

Link: https://arxiv.org/abs/2408.15803v1

Date: 2024-08-28

Summary:

Multimodal Federated Learning frequently encounters challenges of client modality heterogeneity, leading to undesired performances for secondary modality in multimodal learning. It is particularly prevalent in audiovisual learning, with audio is often assumed to be the weaker modality in recognition tasks. To address this challenge, we introduce ModalityMirror to improve audio model performance by leveraging knowledge distillation from an audiovisual federated learning model. ModalityMirror involves two phases: a modality-wise FL stage to aggregate uni-modal encoders; and a federated knowledge distillation stage on multi-modality clients to train an unimodal student model. Our results demonstrate that ModalityMirror significantly improves the audio classification compared to the state-of-the-art FL methods such as Harmony, particularly in audiovisual FL facing video missing. Our approach unlocks the potential for exploiting the diverse modality spectrum inherent in multi-modal FL.

--------------------------------------------------------------------------------------------------------

Secrecy Performance Analysis of RIS-Aided Fluid Antenna Systems

This paper examines the impact of fluid antenna systems (FAS) on reconfigurable intelligent surface (RIS)-aided secure communications. The authors derive analytical expressions for the secrecy outage probability, revealing how the incorporation of FAS and RIS can significantly enhance the performance of secure communications.

Authors: Farshad Rostami Ghadi, Kai-Kit Wong, Masoud Kaveh, F. Javier Lopez-Martinez, Wee Kiat New, Hao Xu

Link: https://arxiv.org/abs/2408.14969v1

Date: 2024-08-27

Summary:

This paper examines the impact of emerging fluid antenna systems (FAS) on reconfigurable intelligent surface (RIS)-aided secure communications. Specifically, we consider a classic wiretap channel, where a fixed-antenna transmitter sends confidential information to an FAS-equipped legitimate user with the help of an RIS, while an FAS-equipped eavesdropper attempts to decode the message. To evaluate the proposed wireless scenario, we first introduce the cumulative distribution function (CDF) and probability density function (PDF) of the signal-to-noise ratio (SNR) at each node, using the central limit theorem and the Gaussian copula function. We then derive a compact analytical expression for the secrecy outage probability (SOP). Our numerical results reveal how the incorporation of FAS and RIS can significantly enhance the performance of secure communications.

--------------------------------------------------------------------------------------------------------

Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

This paper proposes a target-driven attack paradigm to detect internal faults in large language models (LLMs), which can lead to the generation of inappropriate outputs. The method uses another LLM as a detector to efficiently identify the tendencies of target LLMs to generate harmful responses, contributing to the understanding and strengthening of LLM security.

Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao

Link: https://arxiv.org/abs/2408.14853v1

Date: 2024-08-27

Summary:

Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which can lead to the generation of inappropriate outputs. Investigating methods for detecting internal faults in LLMs can help us understand their limitations and improve their security. Existing methods primarily focus on jailbreaking attacks, which involve manually or automatically constructing adversarial content to prompt the target LLM to generate unexpected responses. These methods rely heavily on prompt engineering, which is time-consuming and usually requires specially designed questions. To address these challenges, this paper proposes a target-driven attack paradigm that focuses on directly eliciting the target response instead of optimizing the prompts. We introduce the use of another LLM as the detector for toxic content, referred to as ToxDet. Given a target toxic response, ToxDet can generate a possible question and a preliminary answer to provoke the target model into producing desired toxic responses with meanings equivalent to the provided one. ToxDet is trained by interacting with the target LLM and receiving reward signals from it, utilizing reinforcement learning for the optimization process. While the primary focus of the target models is on open-source LLMs, the fine-tuned ToxDet can also be transferred to attack black-box models such as GPT-4o, achieving notable results. Experimental results on AdvBench and HH-Harmless datasets demonstrate the effectiveness of our methods in detecting the tendencies of target LLMs to generate harmful responses. This algorithm not only exposes vulnerabilities but also provides a valuable resource for researchers to strengthen their models against such attacks.

--------------------------------------------------------------------------------------------------------

Visions of Destruction: Exploring a Potential of Generative AI in Interactive Art

This paper presents the interactive artwork "Visions of Destruction," which uses generative AI to create a dynamic, audience-responsive experience that symbolizes the impact of human activities on the environment. The work demonstrates the potential of generative AI to revolutionize artistic expression, audience engagement, and the interactive art field.

Authors: Mar Canet Sola, Varvara Guljajeva

Link: https://arxiv.org/abs/2408.14644v1

Date: 2024-08-26

Summary:

This paper explores the potential of generative AI within interactive art, employing a practice-based research approach. It presents the interactive artwork "Visions of Destruction" as a detailed case study, highlighting its innovative use of generative AI to create a dynamic, audience-responsive experience. This artwork applies gaze-based interaction to dynamically alter digital landscapes, symbolizing the impact of human activities on the environment by generating contemporary collages created with AI, trained on data about human damage to nature, and guided by audience interaction. The transformation of pristine natural scenes into human-made and industrialized landscapes through viewer interaction serves as a stark reminder of environmental degradation. The paper thoroughly explores the technical challenges and artistic innovations involved in creating such an interactive art installation, emphasizing the potential of generative AI to revolutionize artistic expression, audience engagement, and especially the opportunities for the interactive art field. It offers insights into the conceptual framework behind the artwork, aiming to evoke a deeper understanding and reflection on the Anthropocene era and human-induced climate change. This study contributes significantly to the field of creative AI and interactive art, blending technology and environmental consciousness in a compelling, thought-provoking manner.

--------------------------------------------------------------------------------------------------------

Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Communications

This paper proposes a rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance the perception quality of end-to-end image transmission in intelligent wireless communications. The framework integrates generative adversarial networks to provide detailed and realistic image reconstructions at the receiver, addressing the limitations of traditional rate-distortion optimized solutions.

Authors: Kailin Tan, Jincheng Dai, Zhenyu Liu, Sixian Wang, Xiaoqi Qin, Wenjun Xu, Kai Niu, Ping Zhang

Link: https://arxiv.org/abs/2408.14127v1

Date: 2024-08-26

Summary:

End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.

--------------------------------------------------------------------------------------------------------

Evaluating the Energy Consumption of Machine Learning: Systematic Literature Review and Experiments

This work addresses the need to monitor, understand, and optimize the energy consumption of machine learning (ML) models. The authors conduct a systematic literature review and develop an experimental protocol to compare various tools and methods for evaluating the energy consumption of ML, providing a comprehensive guide for researchers and practitioners.

Authors: Charlotte Rodriguez, Laura Degioanni, Laetitia Kameni, Richard Vidal, Giovanni Neglia

Link: https://arxiv.org/abs/2408.15128v1

Date: 2024-08-27

Summary:

Monitoring, understanding, and optimizing the energy consumption of Machine Learning (ML) are various reasons why it is necessary to evaluate the energy usage of ML. However, there exists no universal tool that can answer this question for all use cases, and there may even be disagreement on how to evaluate energy consumption for a specific use case. Tools and methods are based on different approaches, each with their own advantages and drawbacks, and they need to be mapped out and explained in order to select the most suitable one for a given situation. We address this challenge through two approaches. First, we conduct a systematic literature review of all tools and methods that permit to evaluate the energy consumption of ML (both at training and at inference), irrespective of whether they were originally designed for machine learning or general software. Second, we develop and use an experimental protocol to compare a selection of these tools and methods. The comparison is both qualitative and quantitative on a range of ML tasks of different nature (vision, language) and computational complexity. The systematic literature review serves as a comprehensive guide for understanding the array of tools and methods used in evaluating energy consumption of ML, for various use cases going from basic energy monitoring to consumption optimization. Two open-source repositories are provided for further exploration. The first one contains tools that can be used to replicate this work or extend the current review. The second repository houses the experimental protocol, allowing users to augment the protocol with new ML computing tasks and additional energy evaluation tools.

--------------------------------------------------------------------------------------------------------

Speeding Ticket: Unveiling the Energy and Emission Burden of AI-Accelerated Distributed and Decentralized Power Dispatch Models

This paper investigates the environmental impact of using AI and machine learning technologies to enhance the efficiency of power dispatch operations in distributed electrical grid systems. The study highlights the critical trade-offs between operational efficiency and environmental sustainability, guiding future AI implementations in energy systems.

Authors: Meiyi Li, Javad Mohammadi

Link: https://arxiv.org/abs/2408.13968v1

Date: 2024-08-26

Summary:

As the modern electrical grid shifts towards distributed systems, there is an increasing need for rapid decision-making tools. Artificial Intelligence (AI) and Machine Learning (ML) technologies are now pivotal in enhancing the efficiency of power dispatch operations, effectively overcoming the constraints of traditional optimization solvers with long computation times. However, this increased efficiency comes at a high environmental cost, escalating energy consumption and carbon emissions from computationally intensive AI/ML models. Despite their potential to transform power systems management, the environmental impact of these technologies often remains an overlooked aspect. This paper introduces the first comparison of energy demands across centralized, distributed, and decentralized ML-driven power dispatch models. We provide a detailed analysis of the energy and carbon footprint required for continuous operations on an IEEE 33 bus system, highlighting the critical trade-offs between operational efficiency and environmental sustainability. This study aims to guide future AI implementations in energy systems, ensuring they enhance not only efficiency but also prioritize ecological integrity.

--------------------------------------------------------------------------------------------------------

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts

This work focuses on "upcycling" dense expert models into a Mixture of Experts (MoE) architecture, aiming to improve specialization while also adding the ability to adapt to new tasks easily. The proposed Nexus model introduces an enhanced MoE architecture with adaptive routing, enabling flexible addition of new experts without requiring large-scale MoE training for unseen data domains.

Authors: Nikolas Gritsch, Qizhen Zhang, Acyr Locatelli, Sara Hooker, Ahmet Üstün

Link: https://arxiv.org/abs/2408.15901v1

Date: 2024-08-28

Summary:

Efficiency, specialization, and adaptability to new data distributions are qualities that are hard to combine in current Large Language Models. The Mixture of Experts (MoE) architecture has been the focus of significant research because its inherent conditional computation enables such desirable properties. In this work, we focus on "upcycling" dense expert models into an MoE, aiming to improve specialization while also adding the ability to adapt to new tasks easily. We introduce Nexus, an enhanced MoE architecture with adaptive routing where the model learns to project expert embeddings from domain representations. This approach allows Nexus to flexibly add new experts after the initial upcycling through separately trained dense models, without requiring large-scale MoE training for unseen data domains. Our experiments show that Nexus achieves a relative gain of up to 2.1% over the baseline for initial upcycling, and a 18.8% relative gain for extending the MoE with a new expert by using limited finetuning data. This flexibility of Nexus is crucial to enable an open-source ecosystem where every user continuously assembles their own MoE-mix according to their needs.

--------------------------------------------------------------------------------------------------------

Brain-inspired Artificial Intelligence: A Comprehensive Review

This review explores the diverse design inspirations that have shaped modern AI models, categorizing them into physical structure-inspired and human behavior-inspired approaches. It examines the real-world applications where different brain-inspired AI models excel, highlighting their practical benefits and deployment challenges. The insights provided can help researchers and practitioners harness the potential of brain-inspired AI and expedite advancements in the field.

Authors: Jing Ren, Feng Xia

Link: https://arxiv.org/abs/2408.14811v1

Date: 2024-08-27

Summary:

Current artificial intelligence (AI) models often focus on enhancing performance through meticulous parameter tuning and optimization techniques. However, the fundamental design principles behind these models receive comparatively less attention, which can limit our understanding of their potential and constraints. This comprehensive review explores the diverse design inspirations that have shaped modern AI models, i.e., brain-inspired artificial intelligence (BIAI). We present a classification framework that categorizes BIAI approaches into physical structure-inspired and human behavior-inspired models. We also examine the real-world applications where different BIAI models excel, highlighting their practical benefits and deployment challenges. By delving into these areas, we provide new insights and propose future research directions to drive innovation and address current gaps in the field. This review offers researchers and practitioners a comprehensive overview of the BIAI landscape, helping them harness its potential and expedite advancements in AI development.

--------------------------------------------------------------------------------------------------------

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

This paper introduces the Mini-Omni, an audio-based end-to-end conversational model capable of real-time speech interaction. The model uses a text-instructed speech generation method and batch-parallel strategies to enable seamless speech interaction, without the need for additional text-to-speech systems. This work offers valuable potential for future research on developing language models with integrated speech capabilities.

Authors: Zhifei Xie, Changqiao Wu

Link: https://arxiv.org/abs/2408.16725v2

Date: 2024-08-30

Summary:

Recent advances in language models have achieved significant progress. GPT-4o, as a new milestone, has enabled real-time conversations with humans, demonstrating near-human natural fluency. Such human-computer interaction necessitates models with the capability to perform reasoning directly with the audio modality and generate output in streaming. However, this remains beyond the reach of current academic models, as they typically depend on extra TTS systems for speech synthesis, resulting in undesirable latency. This paper introduces the Mini-Omni, an audio-based end-to-end conversational model, capable of real-time speech interaction. To achieve this capability, we propose a text-instructed speech generation method, along with batch-parallel strategies during inference to further boost the performance. Our method also helps to retain the original model's language capabilities with minimal degradation, enabling other works to establish real-time interaction capabilities. We call this training method "Any Model Can Talk". We also introduce the VoiceAssistant-400K dataset to fine-tune models optimized for speech output. To our best knowledge, Mini-Omni is the first fully end-to-end, open-source model for real-time speech interaction, offering valuable potential for future research.

--------------------------------------------------------------------------------------------------------

LoraMap: Harnessing the Power of LoRA Connections

This paper investigates methods to establish connections among multiple Low-Rank Adaptation (LoRA) models, which can be used to mitigate hallucinations and reduce computational overhead in large language models. The proposed LoraMap approach demonstrates superior performance on a fact-checking task compared to existing LoRA composition methods, while using significantly fewer parameters.

Authors: Hyeryun Park, Jeongwon Kwak, Dongsuk Jang, Sumin Park, Jinwook Choi

Link: https://arxiv.org/abs/2408.16264v1

Date: 2024-08-29

Summary:

Large Language Models (LLMs) can benefit from mitigating hallucinations through fact-checking and overcoming substantial computational overhead with parameter-efficient techniques such as Low-Rank Adaptation (LoRA). While some studies have explored the parallel integration of multiple LoRAs, these approaches need attention to the connections between them. This paper investigates methods to establish connections among multiple LoRAs. We create three reasoning datasets tailored to fact-checking and fine-tune individual LoRAs, allowing them to view and reason from diverse perspectives. Then, we explore strategies for allocating these reasoning LoRAs and introduce LoraMap, an approach to map connections between them. The results on the fact-checking task demonstrate that the performance of LoraMap is superior to LoraHub, an existing LoRA composition method. LoraMap also outperforms with significantly fewer parameters than LoraConcat, which concatenates LoRAs and further fine-tunes them.

--------------------------------------------------------------------------------------------------------

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

This work presents GenDDS, a novel approach for generating diverse and realistic driving video scenarios by leveraging the capabilities of the Stable Diffusion XL model. The method uses descriptive prompts to guide the synthesis process, contributing to the development of sophisticated training data for autonomous driving systems and virtual environments for simulation and validation.

Authors: Yongjie Fu, Yunlong Li, Xuan Di

Link: https://arxiv.org/abs/2408.15868v1

Date: 2024-08-28

Summary:

Autonomous driving training requires a diverse range of datasets encompassing various traffic conditions, weather scenarios, and road types. Traditional data augmentation methods often struggle to generate datasets that represent rare occurrences. To address this challenge, we propose GenDDS, a novel approach for generating driving scenarios generation by leveraging the capabilities of Stable Diffusion XL (SDXL), an advanced latent diffusion model. Our methodology involves the use of descriptive prompts to guide the synthesis process, aimed at producing realistic and diverse driving scenarios. With the power of the latest computer vision techniques, such as ControlNet and Hotshot-XL, we have built a complete pipeline for video generation together with SDXL. We employ the KITTI dataset, which includes real-world driving videos, to train the model. Through a series of experiments, we demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios. This research contributes to the development of sophisticated training data for autonomous driving systems and opens new avenues for creating virtual environments for simulation and validation purposes.

--------------------------------------------------------------------------------------------------------

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

This study introduces a comprehensive benchmark to better characterize the types of tabular datasets where deep learning models excel. The authors evaluate 111 datasets with 20 different models, providing insights into the conditions under which deep learning outperforms traditional methods, which can guide future research and applications.

Authors: Assaf Shmuel, Oren Glickman, Teddy Lazebnik

Link: https://arxiv.org/abs/2408.14817v1

Date: 2024-08-27

Summary:

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this area. Previous comparative benchmarks have shown that DL performance is frequently equivalent or even inferior to models such as Gradient Boosting Machines (GBMs). In this study, we introduce a comprehensive benchmark aimed at better characterizing the types of datasets where DL models excel. Although several important benchmarks for tabular datasets already exist, our contribution lies in the variety and depth of our comparison: we evaluate 111 datasets with 20 different models, including both regression and classification tasks. These datasets vary in scale and include both those with and without categorical variables. Importantly, our benchmark contains a sufficient number of datasets where DL models perform best, allowing for a thorough analysis of the conditions under which DL models excel. Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy (AUC 0.78). We present insights derived from this characterization and compare these findings to previous benchmarks.

--------------------------------------------------------------------------------------------------------

On-device AI: Quantization-aware Training of Transformers in Time-Series

This research focuses on optimizing the Transformer model for time-series forecasting tasks, with the goal of deploying the model on embedded hardware, such as FPGAs. The work investigates the impact of applying quantization-aware training to the Transformer model to reduce its size and runtime memory footprint while leveraging the advantages of FPGA-based hardware acceleration.

Authors: Tianheng Ling, Gregor Schiele

Link: https://arxiv.org/abs/2408.16495v1

Date: 2024-08-29

Summary:

Artificial Intelligence (AI) models for time-series in pervasive computing keep getting larger and more complicated. The Transformer model is by far the most compelling of these AI models. However, it is difficult to obtain the desired performance when deploying such a massive model on a sensor device with limited resources. My research focuses on optimizing the Transformer model for time-series forecasting tasks. The optimized model will be deployed as hardware accelerators on embedded Field Programmable Gate Arrays (FPGAs). I will investigate the impact of applying Quantization-aware Training to the Transformer model to reduce its size and runtime memory footprint while maximizing the advantages of FPGAs.

--------------------------------------------------------------------------------------------------------

MODULI: Unlocking Preference Generalization via Diffusion Models for Offline Multi-Objective Reinforcement Learning

This paper proposes MODULI, a novel approach that uses a preference-conditioned diffusion model as a planner to generate trajectories aligned with various preferences in offline multi-objective reinforcement learning. The method aims to improve generalization to out-of-distribution preferences, which is a common challenge in real-world offline datasets.

Authors: Yifu Yuan, Zhenrui Zheng, Zibin Dong, Jianye Hao

Link: https://arxiv.org/abs/2408.15501v1

Date: 2024-08-28

Summary:

Multi-objective Reinforcement Learning (MORL) seeks to develop policies that simultaneously optimize multiple conflicting objectives, but it requires extensive online interactions. Offline MORL provides a promising solution by training on pre-collected datasets to generalize to any preference upon deployment. However, real-world offline datasets are often conservatively and narrowly distributed, failing to comprehensively cover preferences, leading to the emergence of out-of-distribution (OOD) preference areas. Existing offline MORL algorithms exhibit poor generalization to OOD preferences, resulting in policies that do not align with preferences. Leveraging the excellent expressive and generalization capabilities of diffusion models, we propose MODULI (Multi-objective Diffusion Planner with Sliding Guidance), which employs a preference-conditioned diffusion model as a planner to generate trajectories that align with various preferences and derive action for decision-making. To achieve accurate generation, MODULI introduces two return normalization methods under diverse preferences for refining guidance. To further enhance generalization to OOD preferences, MODULI proposes a novel sliding guidance mechanism, which involves training an additional slider adapter to capture the direction of preference changes. Incorporating the slider, it transitions from in-distribution (ID) preferences to generating OOD preferences, patching, and extending the incomplete Pareto front. Extensive experiments on the D4MORL benchmark demonstrate that our algorithm outperforms state-of-the-art Offline MORL baselines, exhibiting excellent generalization to OOD preferences.

--------------------------------------------------------------------------------------------------------

Morphogenesis of sound creates acoustic rainbows

This work utilizes computational morphogenesis to synthesize complex, energy-efficient, wavelength-sized single-material scattering structures that passively decompose radiated sound into its spatio-spectral components. The resulting "acoustic rainbow" structures demonstrate potential applications in transduction, bionics, energy harvesting, communications, and sensing.

Authors: Rasmus E. Christiansen, Ole Sigmund, Efren Fernandez-Grande

Link: https://arxiv.org/abs/2408.14953v1

Date: 2024-08-27

Summary:

Sound is an essential sensing element for many organisms in nature, and multiple species have evolved organic structures that create complex acoustic scattering and dispersion phenomena to emit and perceive sound unambiguously. To date, it has not proven possible to design artificial scattering structures that rival the performance of those found in organic structures. Contrarily, most sound manipulation relies on active transduction in fluid media rather than relying on passive scattering principles, as are often found in nature. In this work, we utilize computational morphogenesis to synthesize complex energy-efficient wavelength-sized single-material scattering structures that passively decompose radiated sound into its spatio-spectral components. Specifically, we tailor an acoustic rainbow structure with "above unity" efficiency and an acoustic wavelength-splitter. Our work paves the way for a new frontier in sound-field engineering, with potential applications in transduction, bionics, energy harvesting, communications and sensing.

--------------------------------------------------------------------------------------------------------

Causal Reasoning in Software Quality Assurance: A Systematic Review

This systematic literature review provides a broad and detailed overview of the use of causal reasoning for software quality assurance activities. The findings highlight the primary areas where causal reasoning has been applied, the predominant methodologies used, and the potential of causal reasoning to improve various software quality attributes, especially during verification, validation, evolution, and maintenance.

Authors: Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

Link: https://arxiv.org/abs/2408.17183v1

Date: 2024-08-30

Summary:

Context: Software Quality Assurance (SQA) is a fundamental part of software engineering to ensure stakeholders that software products work as expected after release in operation. Machine Learning (ML) has proven to be able to boost SQA activities and contribute to the development of quality software systems. In this context, Causal Reasoning is gaining increasing interest as a methodology to solve some of the current ML limitations. It aims to go beyond a purely data-driven approach by exploiting the use of causality for more effective SQA strategies. Objective: Provide a broad and detailed overview of the use of causal reasoning for SQA activities, in order to support researchers to access this research field, identifying room for application, main challenges and research opportunities. Methods: A systematic literature review of causal reasoning in the SQA research area. Scientific papers have been searched, classified, and analyzed according to established guidelines for software engineering secondary studies. Results: Results highlight the primary areas within SQA where causal reasoning has been applied, the predominant methodologies used, and the level of maturity of the proposed solutions. Fault localization is the activity where causal reasoning is more exploited, especially in the web services/microservices domain, but other tasks like testing are rapidly gaining popularity. Both causal inference and causal discovery are exploited, with the Pearl's graphical formulation of causality being preferred, likely due to its intuitiveness. Tools to favour their application are appearing at a fast pace - most of them after 2021. Conclusions: The findings show that causal reasoning is a valuable means for SQA tasks with respect to multiple quality attributes, especially during V&V, evolution and maintenance to ensure reliability, while it is not yet fully exploited for phases like ...

--------------------------------------------------------------------------------------------------------

CoGen: Learning from Feedback with Coupled Comprehension and Generation

This work explores techniques to tightly integrate language comprehension and generation capabilities, enabling a system to continually learn from interaction with users. The authors demonstrate that the coupling of these two capabilities leads to significant performance improvements and a more human-like language output, which can benefit applications that require natural language understanding and generation.

Authors: Mustafa Omer Gul, Yoav Artzi

Link: https://arxiv.org/abs/2408.15992v1

Date: 2024-08-28

Summary:

Systems with both language comprehension and generation capabilities can benefit from the tight connection between the two. This work studies coupling comprehension and generation with focus on continually learning from interaction with users. We propose techniques to tightly integrate the two capabilities for both learning and inference. We situate our studies in two-player reference games, and deploy various models for thousands of interactions with human users, while learning from interaction feedback signals. We show dramatic improvements in performance over time, with comprehension-generation coupling leading to performance improvements up to 26% in absolute terms and up to 17% higher accuracies compared to a non-coupled system. Our analysis also shows coupling has substantial qualitative impact on the system's language, making it significantly more human-like.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithSeptember 2, 2024Comment