Week Ending 10.6.2024

 

RESEARCH WATCH: 10.6.2024

 

Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

This paper addresses a key challenge in robotics: creating more natural and adaptable gaits for legged robots. The researchers propose using latent action priors learned from a single gait cycle demonstration to guide deep reinforcement learning. This approach aims to overcome the brittleness and unrealistic outcomes often seen in simulated learning environments. By combining these priors with style rewards, the method achieves performance surpassing expert demonstrations and improves transfer to new tasks, even enabling gait transitions at higher speeds. This research could significantly advance the development of more agile and robust legged robots for applications in search and rescue, exploration, and service robotics.

Authors:  Oliver Hausdörfer, Alexander von Rohr, Éric Lefort, Angela Schoellig

Link:  https://arxiv.org/abs/2410.03246v1

Date: 2024-10-04

Summary:

Deep Reinforcement Learning (DRL) in simulation often results in brittle and unrealistic learning outcomes. To push the agent towards more desirable solutions, prior information can be injected in the learning process through, for instance, reward shaping, expert data, or motion primitives. We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space. We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder. Using these latent action priors combined with established style rewards for imitation in DRL achieves above expert demonstration level of performance and leads to more desirable gaits. Further, action priors substantially improve the performance on transfer tasks, even leading to gait transitions for higher target speeds. Videos and code are available at https://sites.google.com/view/latent-action-priors.

--------------------------------------------------------------------------------------------------------

Dissipative Avoidance Feedback for Reactive Navigation Under Second-Order Dynamics

Autonomous robot navigation in unknown, obstacle-filled environments is a critical challenge in robotics. This paper introduces DAF (Dissipative Avoidance Feedback), a novel approach that considers both position and velocity in obstacle avoidance, unlike traditional Artificial Potential Field methods. DAF's continuously differentiable controller ensures smoother, more natural navigation while guaranteeing collision-free movement. Designed for real-time implementation using only local sensor data, this method is particularly valuable for robots operating in unknown spaces. The research has potential applications in autonomous vehicles, warehouse robotics, and search and rescue operations, where efficient and safe navigation in complex, dynamic environments is crucial.

Authors:  Lyes Smaili, Zhiqi Tang, Soulaimane Berkane, Tarek Hamel

Link:  https://arxiv.org/abs/2410.02903v1

Date: 2024-10-03

Summary:

This paper introduces DAF (Dissipative Avoidance Feedback), a novel approach for autonomous robot navigation in unknown, obstacle-filled environments with second-order dynamics. Unlike traditional APF (Artificial Potential Field) methods, which rely on repulsive forces based solely on position, DAF employs a dissipative feedback mechanism that adjusts the robot's motion in response to both its position and velocity, ensuring smoother, more natural obstacle avoidance. The proposed continuously differentiable controller solves the motion-to-goal problem while guaranteeing collision-free navigation by considering the robot's state and local obstacle distance information. We show that the controller guarantees safe navigation in generic $n$-dimensional environments and that all undesired $\omega$-limit points are unstable under certain \textit{controlled} curvature conditions. Designed for real-time implementation, DAF requires only locally measured data from limited-range sensors (e.g., LiDAR, depth cameras), making it particularly effective for robots navigating unknown workspaces.

--------------------------------------------------------------------------------------------------------

FAN: Fourier Analysis Networks

This paper introduces FAN, a novel neural network architecture based on Fourier Analysis, addressing the limitations of current neural networks in modeling and reasoning about periodicity. By integrating Fourier Series into the network structure, FAN achieves more accurate expression and prediction of periodic patterns. This innovation has broad implications across various fields, including time series forecasting, symbolic formula representation, and language modeling. FAN's ability to efficiently capture and reason about periodic phenomena could enhance predictive models in fields such as climate science, financial forecasting, and signal processing, potentially leading to more accurate and interpretable models in these domains.

Authors:  Yihong Dong, Ge Li, Yongding Tao, Xue Jiang, Kechi Zhang, Jia Li, Jing Su, Jun Zhang, Jingjing Xu

Link:  https://arxiv.org/abs/2410.02675v1

Date: 2024-10-03

Summary:

Despite the remarkable success achieved by neural networks, particularly those represented by MLP and Transformer, we reveal that they exhibit potential flaws in the modeling and reasoning of periodicity, i.e., they tend to memorize the periodic data rather than genuinely understanding the underlying principles of periodicity. However, periodicity is a crucial trait in various forms of reasoning and generalization, underpinning predictability across natural and engineered systems through recurring patterns in observations. In this paper, we propose FAN, a novel network architecture based on Fourier Analysis, which empowers the ability to efficiently model and reason about periodic phenomena. By introducing Fourier Series, the periodicity is naturally integrated into the structure and computational processes of the neural network, thus achieving a more accurate expression and prediction of periodic patterns. As a promising substitute to multi-layer perceptron (MLP), FAN can seamlessly replace MLP in various models with fewer parameters and FLOPs. Through extensive experiments, we demonstrate the effectiveness of FAN in modeling and reasoning about periodic functions, and the superiority and generalizability of FAN across a range of real-world tasks, including symbolic formula representation, time series forecasting, and language modeling.

--------------------------------------------------------------------------------------------------------

Contextual Document Embeddings

This research proposes a new approach to dense document embeddings, which are crucial for neural retrieval systems. The authors argue that traditional methods produce out-of-context embeddings and introduce two methods for creating contextualized document embeddings. These methods incorporate information from neighboring documents, similar to contextualized word embeddings. The approach achieves state-of-the-art results on the MTEB benchmark without complex techniques like hard negative mining or dataset-specific instructions. This advancement could significantly improve information retrieval systems, enhancing search engines, recommendation systems, and document classification tasks across various industries, from digital libraries to e-commerce platforms.

Authors:  John X. Morris, Alexander M. Rush

Link:  https://arxiv.org/abs/2410.02525v1

Date: 2024-10-03

Summary:

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are implicitly out-of-context for targeted use cases of retrieval, and that a contextualized document embedding should take into account both the document and neighboring documents in context - analogous to contextualized word embeddings. We propose two complementary methods for contextualized document embeddings: first, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss; second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation. Results show that both methods achieve better performance than biencoders in several settings, with differences especially pronounced out-of-domain. We achieve state-of-the-art results on the MTEB benchmark with no hard negative mining, score distillation, dataset-specific instructions, intra-GPU example-sharing, or extremely large batch sizes. Our method can be applied to improve performance on any contrastive learning dataset and any biencoder.

--------------------------------------------------------------------------------------------------------

Tracking objects that change in appearance with phase synchrony

This paper explores how biological visual systems track objects that change appearance over time, proposing a computational model based on neural synchrony. The researchers introduce a complex-valued recurrent neural network (CV-RNN) that can control attention to features separately from their location. Through comparisons with human performance and other deep neural networks on the FeatureTracker challenge, the CV-RNN demonstrates human-like ability in tracking objects with changing appearances. This research has potential applications in computer vision systems for autonomous vehicles, surveillance, and augmented reality, where robust object tracking under varying conditions is crucial.

Authors:  Sabine Muzellec, Drew Linsley, Alekh K. Ashok, Ennio Mingolla, Girik Malik, Rufin VanRullen, Thomas Serre

Link:  https://arxiv.org/abs/2410.02094v1

Date: 2024-10-02

Summary:

Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or movement of nonrigid objects can drastically alter available image features. How do biological visual systems track objects as they change? It may involve specific attentional mechanisms for reasoning about the locations of objects independently of their appearances -- a capability that prominent neuroscientific theories have associated with computing through neural synchrony. We computationally test the hypothesis that the implementation of visual attention through neural synchrony underlies the ability of biological visual systems to track objects that change in appearance over time. We first introduce a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using FeatureTracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved FeatureTracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about.

--------------------------------------------------------------------------------------------------------

Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning

The authors introduce Sable, a novel algorithm for multi-agent reinforcement learning (MARL) that addresses the challenges of performance, memory efficiency, and scalability. By adapting the retention mechanism from Retentive Networks, Sable achieves superior performance in diverse environments while maintaining efficiency with large numbers of agents. This research has significant implications for complex multi-agent systems, such as traffic management, swarm robotics, and large-scale simulations. Sable's ability to handle environments with over a thousand agents could enable more realistic and sophisticated simulations for urban planning, logistics optimization, and collaborative robotics.

Authors:  Omayma Mahjoub, Sasha Abramowitz, Ruan de Kock, Wiem Khlifi, Simon du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Claude Formanek, Liam Clark, Arnu Pretorius

Link:  https://arxiv.org/abs/2410.01706v1

Date: 2024-10-02

Summary:

As the field of multi-agent reinforcement learning (MARL) progresses towards larger and more complex environments, achieving strong performance while maintaining memory efficiency and scalability to many agents becomes increasingly important. Although recent research has led to several advanced algorithms, to date, none fully address all of these key properties simultaneously. In this work, we introduce Sable, a novel and theoretically sound algorithm that adapts the retention mechanism from Retentive Networks to MARL. Sable's retention-based sequence modelling architecture allows for computationally efficient scaling to a large number of agents, as well as maintaining a long temporal context, making it well-suited for large-scale partially observable environments. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in the majority of tasks (34 out of 45, roughly 75\%). Furthermore, Sable demonstrates stable performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage. Our results highlight Sable's performance and efficiency, positioning it as a leading approach to MARL at scale.

--------------------------------------------------------------------------------------------------------

Transformers Handle Endogeneity in In-Context Linear Regression

This paper explores the capability of transformer models to address endogeneity in linear regression tasks. The researchers demonstrate that transformers can effectively use instrumental variables to handle endogeneity, emulating a gradient-based bi-level optimization procedure. They also propose an in-context pretraining scheme with theoretical guarantees. This work has important implications for econometrics and causal inference, potentially improving the accuracy and reliability of regression analyses in fields such as economics, social sciences, and policy research. The ability to handle endogeneity in-context could lead to more robust and adaptable statistical models in various real-world applications.

Authors:  Haodong Liang, Krishnakumar Balasubramanian, Lifeng Lai

Link:  https://arxiv.org/abs/2410.01265v1

Date: 2024-10-02

Summary:

We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity.

--------------------------------------------------------------------------------------------------------

Were RNNs All We Needed?

This paper revisits traditional recurrent neural networks (RNNs), specifically LSTMs and GRUs, in light of recent advancements in sequence modeling. By modifying these models to remove hidden state dependencies from certain gates, the authors create minimal versions (minLSTMs and minGRUs) that are fully parallelizable during training and use fewer parameters. These stripped-down versions match the performance of recent sequence models while being significantly faster to train. This research could have far-reaching implications for natural language processing, time series analysis, and other sequence modeling tasks, potentially leading to more efficient and scalable models for applications like machine translation, speech recognition, and predictive maintenance.

Authors:  Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh

Link:  https://arxiv.org/abs/2410.01201v2

Date: 2024-10-04

Summary:

The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (1997) and GRUs (2014). While these models were slow due to requiring to backpropagate through time (BPTT), we show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models.

--------------------------------------------------------------------------------------------------------

Explainable Diagnosis Prediction through Neuro-Symbolic Integration

This study explores the use of Logical Neural Networks (LNNs) for developing explainable models in medical diagnosis prediction. By integrating domain-specific knowledge through logical rules with learnable thresholds, the researchers create models that outperform traditional machine learning approaches while providing interpretable insights. This work addresses the critical need for transparency in healthcare AI applications, potentially advancing precision medicine and supporting equitable healthcare solutions. The approach could be applied to various medical conditions, improving diagnostic accuracy and patient trust in AI-assisted healthcare decisions.

Authors:  Qiuhao Lu, Rui Li, Elham Sagheb, Andrew Wen, Jinlian Wang, Liwei Wang, Jungwei W. Fan, Hongfang Liu

Link:  https://arxiv.org/abs/2410.01855v1

Date: 2024-10-01

Summary:

Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.

--------------------------------------------------------------------------------------------------------

ReXplain: Translating Radiology into Patient-Friendly Video Reports

ReXplain is an innovative AI-driven system that generates patient-friendly video reports from radiology findings. By integrating a large language model for text simplification, an image segmentation model, and an avatar generation tool, ReXplain produces comprehensive explanations with plain language, highlighted imagery, and 3D organ renderings. This system has the potential to revolutionize patient communication in radiology, improving patient engagement and satisfaction. ReXplain could be particularly valuable in telemedicine, patient education, and improving health literacy, ultimately leading to better-informed patients and potentially improved health outcomes.

Authors:  Luyang Luo, Jenanan Vairavamurthy, Xiaoman Zhang, Abhinav Kumar, Ramon R. Ter-Oganesyan, Stuart T. Schroff, Dan Shilo, Rydhwana Hossain, Mike Moritz, Pranav Rajpurkar

Link:  https://arxiv.org/abs/2410.00441v1

Date: 2024-10-01

Summary:

Radiology reports often remain incomprehensible to patients, undermining patient-centered care. We present ReXplain (Radiology eXplanation), an innovative AI-driven system that generates patient-friendly video reports for radiology findings. ReXplain uniquely integrates a large language model for text simplification, an image segmentation model for anatomical region identification, and an avatar generation tool, producing comprehensive explanations with plain language, highlighted imagery, and 3D organ renderings. Our proof-of-concept study with five board-certified radiologists indicates that ReXplain could accurately deliver radiological information and effectively simulate one-on-one consultations. This work demonstrates a new paradigm in AI-assisted medical communication, potentially improving patient engagement and satisfaction in radiology care, and opens new avenues for research in multimodal medical communication.

--------------------------------------------------------------------------------------------------------

Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback

This paper introduces a novel approach to Automatic Short Answer Scoring with Feedback (ASAS-F) using a modular retrieval augmented generation (RAG) based system. The proposed method operates in zero-shot and few-shot learning scenarios, offering improved scoring accuracy and detailed feedback without extensive fine-tuning or prompt engineering. This research has significant implications for educational technology, potentially reducing the grading burden on educators while providing more consistent and detailed feedback to students. The system's adaptability to various educational tasks makes it a promising tool for large-scale online education platforms, standardized testing, and personalized learning environments.

Authors:  Menna Fateen, Bo Wang, Tsunenori Mine

Link:  https://arxiv.org/abs/2409.20042v1

Date: 2024-09-30

Summary:

Automatic short answer scoring (ASAS) helps reduce the grading burden on educators but often lacks detailed, explainable feedback. Existing methods in ASAS with feedback (ASAS-F) rely on fine-tuning language models with limited datasets, which is resource-intensive and struggles to generalize across contexts. Recent approaches using large language models (LLMs) have focused on scoring without extensive fine-tuning. However, they often rely heavily on prompt engineering and either fail to generate elaborated feedback or do not adequately evaluate it. In this paper, we propose a modular retrieval augmented generation based ASAS-F system that scores answers and generates feedback in strict zero-shot and few-shot learning scenarios. We design our system to be adaptable to various educational tasks without extensive prompt engineering using an automatic prompt generation framework. Results show an improvement in scoring accuracy by 9\% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution.

--------------------------------------------------------------------------------------------------------

Ward: Provable RAG Dataset Inference via LLM Watermarks

Ward addresses the critical issue of unauthorized use of content in Retrieval-Augmented Generation (RAG) systems. By formalizing the problem as RAG Dataset Inference (RAG-DI) and introducing a novel benchmarking dataset, the researchers provide a foundation for studying this challenge. Ward, based on LLM watermarks, offers data owners rigorous statistical guarantees regarding the usage of their dataset in RAG systems. This research has important implications for data privacy, intellectual property protection, and ethical AI development. Ward could be particularly valuable in industries where proprietary information is crucial, such as legal, medical, and financial sectors, ensuring responsible use of data in AI-powered systems.

Authors:  Nikola Jovanović, Robin Staab, Maximilian Baader, Martin Vechev

Link:  https://arxiv.org/abs/2410.03537v1

Date: 2024-10-04

Summary:

Retrieval-Augmented Generation (RAG) improves LLMs by enabling them to incorporate external data during generation. This raises concerns for data owners regarding unauthorized use of their content in RAG systems. Despite its importance, the challenge of detecting such unauthorized usage remains underexplored, with existing datasets and methodologies from adjacent fields being ill-suited for its study. In this work, we take several steps to bridge this gap. First, we formalize this problem as (black-box) RAG Dataset Inference (RAG-DI). To facilitate research on this challenge, we further introduce a novel dataset specifically designed for benchmarking RAG-DI methods under realistic conditions, and propose a set of baseline approaches. Building on this foundation, we introduce Ward, a RAG-DI method based on LLM watermarks that enables data owners to obtain rigorous statistical guarantees regarding the usage of their dataset in a RAG system. In our experimental evaluation, we show that Ward consistently outperforms all baselines across many challenging settings, achieving higher accuracy, superior query efficiency and robustness. Our work provides a foundation for future studies of RAG-DI and highlights LLM watermarks as a promising approach to this problem.

--------------------------------------------------------------------------------------------------------

A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models

This paper explores the theoretical properties of conditional deep generative models in the context of distribution regression. The researchers study a likelihood-based approach for estimating these models, providing convergence rates for a sieve maximum likelihood estimator. This work offers insights into why conditional deep generative models can effectively learn a broad class of nearly singular conditional distributions without suffering from the curse of dimensionality. The findings have potential applications in various fields requiring complex distribution modeling, such as climate science, econometrics, and biostatistics, potentially leading to more accurate and efficient predictive models in these domains.

Authors:  Shivam Kumar, Yun Yang, Lizhen Lin

Link:  https://arxiv.org/abs/2410.02025v1

Date: 2024-10-02

Summary:

In this work, we explore the theoretical properties of conditional deep generative models under the statistical framework of distribution regression where the response variable lies in a high-dimensional ambient space but concentrates around a potentially lower-dimensional manifold. More specifically, we study the large-sample properties of a likelihood-based approach for estimating these models. Our results lead to the convergence rate of a sieve maximum likelihood estimator (MLE) for estimating the conditional distribution (and its devolved counterpart) of the response given predictors in the Hellinger (Wasserstein) metric. Our rates depend solely on the intrinsic dimension and smoothness of the true conditional distribution. These findings provide an explanation of why conditional deep generative models can circumvent the curse of dimensionality from the perspective of statistical foundations and demonstrate that they can learn a broader class of nearly singular conditional distributions. Our analysis also emphasizes the importance of introducing a small noise perturbation to the data when they are supported sufficiently close to a manifold. Finally, in our numerical studies, we demonstrate the effective implementation of the proposed approach using both synthetic and real-world datasets, which also provide complementary validation to our theoretical findings.

--------------------------------------------------------------------------------------------------------

Label-Free Subjective Player Experience Modelling via Let's Play Videos

This research proposes a novel approach to Player Experience Modelling (PEM) by approximating player experience from gameplay videos without the need for expert hand-authoring or specialized data collection. The method is evaluated through a human subject study predicting affect in the game Angry Birds. This approach could revolutionize game design and personalization, allowing developers to gain insights into player experiences more efficiently and at a larger scale. The technique has potential applications beyond gaming, such as in user experience research for software applications, educational technology, and interactive media, offering a less intrusive way to understand user engagement and emotional responses.

Authors:  Dave Goel, Athar Mahmoudi-Nejad, Matthew Guzdial

Link:  https://arxiv.org/abs/2410.02967v1

Date: 2024-10-03

Summary:

Player Experience Modelling (PEM) is the study of AI techniques applied to modelling a player's experience within a video game. PEM development can be labour-intensive, requiring expert hand-authoring or specialized data collection. In this work, we propose a novel PEM development approach, approximating player experience from gameplay video. We evaluate this approach predicting affect in the game Angry Birds via a human subject study. We validate that our PEM can strongly correlate with self-reported and sensor measures of affect, demonstrating the potential of this approach.

--------------------------------------------------------------------------------------------------------

Selective Attention Improves Transformer

This paper introduces Selective Attention, a simple parameter-free modification to the standard attention mechanism in transformer models. By reducing attention to unneeded elements, this approach improves language modeling performance across various model sizes and context lengths. Selective Attention allows for significant reductions in memory and compute requirements during inference, potentially leading to more efficient and scalable language models. This research could have wide-ranging implications for natural language processing applications, including machine translation, text summarization, and question-answering systems, enabling more powerful models to run on resource-constrained devices.

Authors:  Yaniv Leviathan, Matan Kalman, Yossi Matias

Link:  https://arxiv.org/abs/2410.02703v1

Date: 2024-10-03

Summary:

Unneeded elements in the attention's context degrade performance. We introduce Selective Attention, a simple parameter-free change to the standard attention mechanism which reduces attention to unneeded elements. Selective attention improves language modeling performance in a variety of model sizes and context lengths. For example, a range of transformers trained with the language modeling objective on C4 with selective attention perform equivalently to standard transformers with ~2X more heads and parameters in their attention modules. Selective attention also allows decreasing the size of the attention's context buffer, leading to meaningful reductions in the memory and compute requirements during inference. For example, transformers with 100M parameters trained on C4 with context sizes of 512, 1,024, and 2,048 need 16X, 25X, and 47X less memory for their attention module, respectively, when equipped with selective attention, as those without selective attention, with the same validation perplexity.

--------------------------------------------------------------------------------------------------------

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

nGPT presents a novel neural network architecture where all vectors forming the embeddings, MLP, attention matrices, and hidden states are unit norm normalized. This approach, which confines the input stream of tokens to the surface of a hypersphere, demonstrates significantly faster learning, reducing the number of required training steps by a factor of 4 to 20. The research has potential implications for more efficient training of large language models and other transformer-based architectures. This could lead to reduced computational costs and energy consumption in AI model development, making advanced NLP capabilities more accessible and sustainable.

Authors:  Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun, Boris Ginsburg

Link:  https://arxiv.org/abs/2410.01131v1

Date: 2024-10-01

Summary:

We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere. Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.

--------------------------------------------------------------------------------------------------------

Performance Analysis of 6TiSCH Networks Using Discrete Events Simulator

This study evaluates the scalability of the 6TiSCH protocol, which is crucial for long-distance communication in IoT networks. The researchers focus on key parameters such as queue size, maximum number of single-hop retries, and slotframe length, using computational simulations with an open-source simulator. The findings provide insights into optimizing these parameters for reduced packet error rates and improved latency as networks scale. This research has important implications for the deployment of large-scale IoT networks, particularly in industrial IoT, smart cities, and environmental monitoring applications, where reliable long-distance communication is essential.

Authors:  Guilherme de Santi Peron, Marcos Eduardo Pivaro Monteiro, João Luís Verdegay de Barros, Jamil Farhat, Glauber Brante

Link:  https://arxiv.org/abs/2410.03383v1

Date: 2024-10-04

Summary:

The Internet of Things (IoT) empowers small devices to sense, react, and communicate, with applications ranging from smart ordinary household objects to complex industrial processes. To provide access to an increasing number of IoT devices, particularly in long-distance communication scenarios, a robust low-power wide area network (LPWAN) protocol becomes essential. A widely adopted protocol for this purpose is 6TiSCH, which builds upon the IEEE 802.15.4 standard. It introduces time-slotted channel hopping (TSCH) mode as a new medium access control (MAC) layer operating mode, in conjunction with IEEE 802.15.4g, which also defines both MAC and physical layer (PHY) layers and provides IPv6 connectivity for LPWAN. Notably, 6TiSCH has gained adoption in significant standards such as Wireless Intelligent Ubiquitous Networks (Wi-SUN). This study evaluates the scalability of 6TiSCH, with a focus on key parameters such as queue size, the maximum number of single-hop retries, and the slotframe length. Computational simulations were performed using an open-source simulator and obtained the following results: increasing the transmission queue size, along with adjusting the number of retries and slotframe length, leads to a reduction in the packet error rate (PER). Notably, the impact of the number of retries is particularly pronounced. Furthermore, the effect on latency varies based on the specific combination of these parameters as the network scales.

--------------------------------------------------------------------------------------------------------

SEAL: SEmantic-Augmented Imitation Learning via Language Model

SEAL introduces a novel framework for Hierarchical Imitation Learning (HIL) that leverages Large Language Models (LLMs) for specifying sub-goal spaces and pre-labeling states. By combining LLM-guided sub-goal learning with unsupervised Vector Quantization and incorporating a transition-augmented low-level planner, SEAL demonstrates superior performance in complex long-horizon tasks, especially with small expert datasets. This research has potential applications in robotics, autonomous systems, and AI assistants, enabling more efficient learning of complex, multi-step tasks from limited demonstrations. SEAL could significantly advance the development of more versatile and adaptable AI systems capable of handling a wide range of real-world scenarios.

Authors:  Chengyang Gu, Yuxin Pan, Haotian Bai, Hui Xiong, Yize Chen

Link:  https://arxiv.org/abs/2410.02231v1

Date: 2024-10-03

Summary:

Hierarchical Imitation Learning (HIL) is a promising approach for tackling long-horizon decision-making tasks. While it is a challenging task due to the lack of detailed supervisory labels for sub-goal learning, and reliance on hundreds to thousands of expert demonstrations. In this work, we introduce SEAL, a novel framework that leverages Large Language Models (LLMs)'s powerful semantic and world knowledge for both specifying sub-goal space and pre-labeling states to semantically meaningful sub-goal representations without prior knowledge of task hierarchies. SEAL employs a dual-encoder structure, combining supervised LLM-guided sub-goal learning with unsupervised Vector Quantization (VQ) for more robust sub-goal representations. Additionally, SEAL incorporates a transition-augmented low-level planner for improved adaptation to sub-goal transitions. Our experiments demonstrate that SEAL outperforms state-of-the-art HIL methods and LLM-based planning approaches, particularly in settings with small expert datasets and complex long-horizon tasks.

--------------------------------------------------------------------------------------------------------

On Uncertainty In Natural Language Processing

This thesis explores uncertainty in Natural Language Processing (NLP) from linguistic, statistical, and neural perspectives. The research investigates how uncertainty can be reduced and quantified through experimental pipeline design and explores the effect of inductive model biases in text classification tasks. The work also proposes methods for calibrated sampling in natural language generation and quantifying confidence in large black-box language models. This comprehensive study of uncertainty in NLP has broad implications for improving the reliability and interpretability of language models across various applications, from machine translation and sentiment analysis to content generation and decision-support systems in fields like healthcare and finance.

Authors:  Dennis Ulmer

Link:  https://arxiv.org/abs/2410.03446v1

Date: 2024-10-04

Summary:

The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development.   This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.

--------------------------------------------------------------------------------------------------------

Exploring How Non-Prehensile Manipulation Expands Capability in Robots Experiencing Multi-Joint Failure

This research investigates the use of non-prehensile manipulation (NPM) and whole-body interaction to enable robotic manipulators to perform tasks despite experiencing locked multi-joint (LMJ) failures. By modeling the failure-constrained workspace, generating a kinodynamic map of NPM actions, and using a sim-in-the-loop action planner, the approach significantly increases the reachable area in LMJ cases. This work has important implications for improving the robustness and adaptability of robotic systems in industrial, healthcare, and disaster response scenarios. The ability to continue operations despite joint failures could lead to more resilient robots in manufacturing, service robotics, and space exploration applications.

Authors:  Gilberto Briscoe-Martinez, Anuj Pasricha, Ava Abderezaei, Santosh Chaganti, Sarath Chandra Vajrala, Sri Kanth Popuri, Alessandro Roncone

Link:  https://arxiv.org/abs/2410.01102v1

Date: 2024-10-01

Summary:

This work explores non-prehensile manipulation (NPM) and whole-body interaction as strategies for enabling robotic manipulators to conduct manipulation tasks despite experiencing locked multi-joint (LMJ) failures. LMJs are critical system faults where two or more joints become inoperable; they impose constraints on the robot's configuration and control spaces, consequently limiting the capability and reach of a prehensile-only approach. This approach involves three components: i) modeling the failure-constrained workspace of the robot, ii) generating a kinodynamic map of NPM actions within this workspace, and iii) a manipulation action planner that uses a sim-in-the-loop approach to select the best actions to take from the kinodynamic map. The experimental evaluation shows that our approach can increase the failure-constrained reachable area in LMJ cases by 79%. Further, it demonstrates the ability to complete real-world manipulation with up to 88.9% success when the end-effector is unusable and up to 100% success when it is usable.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.