Week Ending 9.8.2024

RESEARCH WATCH: 9.8.2024

SPACE: A Python-based Simulator for Evaluating Decentralized Multi-Robot Task Allocation Algorithms

This paper proposes SPACE, a Python simulator for evaluating algorithms that assign tasks to robots in a swarm. SPACE allows researchers to easily implement decision-making algorithms and analyze their performance under various conditions.Applications: Develop and test robust algorithms for multi-robot collaboration in areas like search and rescue, environmental monitoring, and construction.

Authors: Inmo Jang

Link: https://arxiv.org/abs/2409.04230v1

Date: 2024-09-06

Summary:

Swarm robotics explores the coordination of multiple robots to achieve collective goals, with collective decision-making being a central focus. This process involves decentralized robots autonomously making local decisions and communicating them, which influences the overall emergent behavior. Testing such decentralized algorithms in real-world scenarios with hundreds or more robots is often impractical, underscoring the need for effective simulation tools. We propose SPACE (Swarm Planning and Control Evaluation), a Python-based simulator designed to support the research, evaluation, and comparison of decentralized Multi-Robot Task Allocation (MRTA) algorithms. SPACE streamlines core algorithmic development by allowing users to implement decision-making algorithms as Python plug-ins, easily construct agent behavior trees via an intuitive GUI, and leverage built-in support for inter-agent communication and local task awareness. To demonstrate its practical utility, we implement and evaluate CBBA and GRAPE within the simulator, comparing their performance across different metrics, particularly in scenarios with dynamically introduced tasks. This evaluation shows the usefulness of SPACE in conducting rigorous and standardized comparisons of MRTA algorithms, helping to support future research in the field.

--------------------------------------------------------------------------------------------------------

KAN See In the Dark

This work introduces a novel method based on Kolmogorov-Arnold Networks (KANs) for low-light image enhancement. KANs effectively capture non-linear dependencies, improving performance over traditional methods.Applications: Enhance low-light images captured by security cameras, smartphones, and other devices, improving visibility and safety in dark environments.

Authors: Aoxiang Ning, Minglong Xue, Jinhong He, Chengyun Song

Link: https://arxiv.org/abs/2409.03404v1

Date: 2024-09-05

Summary:

Existing low-light image enhancement methods are difficult to fit the complex nonlinear relationship between normal and low-light images due to uneven illumination and noise effects. The recently proposed Kolmogorov-Arnold networks (KANs) feature spline-based convolutional layers and learnable activation functions, which can effectively capture nonlinear dependencies. In this paper, we design a KAN-Block based on KANs and innovatively apply it to low-light image enhancement. This method effectively alleviates the limitations of current methods constrained by linear network structures and lack of interpretability, further demonstrating the potential of KANs in low-level vision tasks. Given the poor perception of current low-light image enhancement methods and the stochastic nature of the inverse diffusion process, we further introduce frequency-domain perception for visually oriented enhancement. Extensive experiments demonstrate the competitive performance of our method on benchmark datasets. The code will be available at: https://github.com/AXNing/KSID}{https://github.com/AXNing/KSID.

--------------------------------------------------------------------------------------------------------

Recent Advances in Attack and Defense Approaches of Large Language Models

This paper reviews recent research on LLM vulnerabilities and defenses. It analyzes attack vectors, evaluates existing defenses, and identifies future directions for enhancing LLM security.Applications: Guide developers in building more secure LLMs used in a wide range of applications like chatbots, machine translation, and text generation.

Authors: Jing Cui, Yishi Xu, Zhewei Huang, Shuchang Zhou, Jianbin Jiao, Junge Zhang

Link: https://arxiv.org/abs/2409.03274v2

Date: 2024-09-06

Summary:

Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities. However, their widespread deployment has raised significant safety and reliability concerns. Established vulnerabilities in deep neural networks, coupled with emerging threat models, may compromise security evaluations and create a false sense of security. Given the extensive research in the field of LLM security, we believe that summarizing the current state of affairs will help the research community better understand the present landscape and inform future developments. This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms. We analyze recent studies on attack vectors and model weaknesses, providing insights into attack mechanisms and the evolving threat landscape. We also examine current defense strategies, highlighting their strengths and limitations. By contrasting advancements in attack and defense methodologies, we identify research gaps and propose future directions to enhance LLM security. Our goal is to advance the understanding of LLM safety challenges and guide the development of more robust security measures.

--------------------------------------------------------------------------------------------------------

TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations

This work proposes TC-LLaVA, a method that improves video understanding by enhancing inter-layer attention computation within the LLM itself. TC-LLaVA considers temporal relationships between video frames, leading to improved performance.Applications: Analyze and understand video content for tasks like video captioning, activity recognition, and video summarization.

Authors: Mingze Gao, Jingyu Liu, Mingda Li, Jiangtao Xie, Qingbin Liu, Bo Zhao, Xi Chen, Hui Xiong

Link: https://arxiv.org/abs/2409.03206v1

Date: 2024-09-05

Summary:

Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most efforts concentrate on enhancing the vision encoder and projector components, while the core part, Large Language Models (LLMs), remains comparatively under-explored. In this paper, we propose two strategies to enhance the model's capability in video understanding tasks by improving inter-layer attention computation in LLMs. Specifically, the first approach focuses on the enhancement of Rotary Position Embedding (RoPE) with Temporal-Aware Dual RoPE, which introduces temporal position information to strengthen the MLLM's temporal modeling capabilities while preserving the relative position relationships of both visual and text tokens. The second approach involves enhancing the Attention Mask with the Frame-wise Block Causal Attention Mask, a simple yet effective method that broadens visual token interactions within and across video frames while maintaining the causal inference mechanism. Based on these proposed methods, we adapt LLaVA for video understanding tasks, naming it Temporal-Considered LLaVA (TC-LLaVA). Our TC-LLaVA achieves new state-of-the-art performance across various video understanding benchmarks with only supervised fine-tuning (SFT) on video-related datasets.

--------------------------------------------------------------------------------------------------------

Configurable Foundation Models: Building LLMs from a Modular Perspective

This paper proposes a modular approach to building LLMs, where the model is decomposed into functional "bricks." This allows for efficient inference and dynamic configuration based on specific tasks.Applications: Develop more efficient and scalable LLMs that can be used on devices with limited resources, like smartphones and edge devices, enabling a wider range of on-device AI applications.

Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

Link: https://arxiv.org/abs/2409.02877v1

Date: 2024-09-04

Summary:

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

--------------------------------------------------------------------------------------------------------

Action-Based ADHD Diagnosis in Video

This research introduces the first video-based method for ADHD diagnosis by analyzing specific actions in the video. This offers a potentially low-cost and efficient alternative to traditional methods.Applications: Develop cost-effective and accessible tools for ADHD screening and diagnosis, improving early detection and treatment for children with ADHD.

Authors: Yichun Li, Yuxing Yang, Syed Nohsen Naqvi

Link: https://arxiv.org/abs/2409.02261v1

Date: 2024-09-03

Summary:

Attention Deficit Hyperactivity Disorder (ADHD) causes significant impairment in various domains. Early diagnosis of ADHD and treatment could significantly improve the quality of life and functioning. Recently, machine learning methods have improved the accuracy and efficiency of the ADHD diagnosis process. However, the cost of the equipment and trained staff required by the existing methods are generally huge. Therefore, we introduce the video-based frame-level action recognition network to ADHD diagnosis for the first time. We also record a real multi-modal ADHD dataset and extract three action classes from the video modality for ADHD diagnosis. The whole process data have been reported to CNTW-NHS Foundation Trust, which would be reviewed by medical consultants/professionals and will be made public in due course.

--------------------------------------------------------------------------------------------------------

Latent Distillation for Continual Object Detection at the Edge

This research proposes Latent Distillation (LD), a method that reduces memory and computation requirements for continual object detection on edge devices. LD achieves competitive performance compared to other distillation methods. Develop more efficient and adaptable object detection models for edge devices used in robotics, automotive, and other dynamic environments.

Authors: Francesco Pasti, Marina Ceccon, Davide Dalle Pezze, Francesco Paissan, Elisabetta Farella, Gian Antonio Susto, Nicola Bellotto

Link: https://arxiv.org/abs/2409.01872v1

Date: 2024-09-03

Summary:

While numerous methods achieving remarkable performance exist in the Object Detection literature, addressing data distribution shifts remains challenging. Continual Learning (CL) offers solutions to this issue, enabling models to adapt to new data while maintaining performance on previous data. This is particularly pertinent for edge devices, common in dynamic environments like automotive and robotics. In this work, we address the memory and computation constraints of edge devices in the Continual Learning for Object Detection (CLOD) scenario. Specifically, (i) we investigate the suitability of an open-source, lightweight, and fast detector, namely NanoDet, for CLOD on edge devices, improving upon larger architectures used in the literature. Moreover, (ii) we propose a novel CL method, called Latent Distillation~(LD), that reduces the number of operations and the memory required by state-of-the-art CL approaches without significantly compromising detection performance. Our approach is validated using the well-known VOC and COCO benchmarks, reducing the distillation parameter overhead by 74\% and the Floating Points Operations~(FLOPs) by 56\% per model update compared to other distillation methods.

--------------------------------------------------------------------------------------------------------

Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations

This work compares human and AI evaluation of dialogue systems across various factors like coherence, innovation, and factual accuracy. It reveals strengths and weaknesses of both approaches for assessment. Improve dialogue evaluation methods for developing more human-like and reliable AI communication tools like chatbots and virtual assistants.

Authors: Ike Ebubechukwu, Johane Takeuchi, Antonello Ceravola, Frank Joublin

Link: https://arxiv.org/abs/2409.01808v1

Date: 2024-09-03

Summary:

As dialogue systems and chatbots increasingly integrate into everyday interactions, the need for efficient and accurate evaluation methods becomes paramount. This study explores the comparative performance of human and AI assessments across a range of dialogue scenarios, focusing on seven key performance indicators (KPIs): Coherence, Innovation, Concreteness, Goal Contribution, Commonsense Contradiction, Incorrect Fact, and Redundancy. Utilizing the GPT-4o API, we generated a diverse dataset of conversations and conducted a two-part experimental analysis. In Experiment 1, we evaluated multi-party conversations on Coherence, Innovation, Concreteness, and Goal Contribution, revealing that GPT models align closely with human judgments. Notably, both human and AI evaluators exhibited a tendency towards binary judgment rather than linear scaling, highlighting a shared challenge in these assessments. Experiment 2 extended the work of Finch et al. (2023) by focusing on dyadic dialogues and assessing Commonsense Contradiction, Incorrect Fact, and Redundancy. The results indicate that while GPT-4o demonstrates strong performance in maintaining factual accuracy and commonsense reasoning, it still struggles with reducing redundancy and self-contradiction. Our findings underscore the potential of GPT models to closely replicate human evaluation in dialogue systems, while also pointing to areas for improvement. This research offers valuable insights for advancing the development and implementation of more refined dialogue evaluation methodologies, contributing to the evolution of more effective and human-like AI communication tools.

--------------------------------------------------------------------------------------------------------

Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training

This paper proposes Pureformer-VC, a voice conversion system that uses specific transformer blocks to effectively encode and transfer speaker characteristics. Develop more accurate and efficient one-shot voice conversion technology for various applications like voice anonymization and personalization.

Authors: Wenhan Yao, Zedong Xing, Xiarun Chen, Jia Liu, Yongqiang He, Weiping Wen

Link: https://arxiv.org/abs/2409.01668v2

Date: 2024-09-06

Summary:

One-shot voice conversion(VC) aims to change the timbre of any source speech to match that of the target speaker with only one speech sample. Existing style transfer-based VC methods relied on speech representation disentanglement and suffered from accurately and independently encoding each speech component and recomposing back to converted speech effectively. To tackle this, we proposed Pureformer-VC, which utilizes Conformer blocks to build a disentangled encoder, and Zipformer blocks to build a style transfer decoder as the generator. In the decoder, we used effective styleformer blocks to integrate speaker characteristics effectively into the generated speech. The models used the generative VAE loss for encoding components and triplet loss for unsupervised discriminative training. We applied the styleformer method to Zipformer's shared weights for style transfer. The experimental results show that the proposed model achieves comparable subjective scores and exhibits improvements in objective metrics compared to existing methods in a one-shot voice conversion scenario.

--------------------------------------------------------------------------------------------------------

On the Design Space Between Transformers and Recursive Neural Nets

This work explores the connections between transformers and RvNNs, highlighting two recent models that bridge the gap between these architectures. These "bridge" models show promising performance in tasks where simpler models struggle. The authors identify areas for further research to leverage the strengths of both transformers and RvNNs.

Authors: Jishnu Ray Chowdhury, Cornelia Caragea

Link: https://arxiv.org/abs/2409.01531v1

Date: 2024-09-03

Summary:

In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR). On one hand, CRvNN pushes the boundaries of traditional RvNN, relaxing its discrete structure-wise composition and ends up with a Transformer-like structure. On the other hand, NDR constrains the original Transformer to induce better structural inductive bias, ending up with a model that is close to CRvNN. Both models, CRvNN and NDR, show strong performance in algorithmic tasks and generalization in which simpler forms of RvNNs and Transformers fail. We explore these "bridge" models in the design space between RvNNs and Transformers, formalize their tight connections, discuss their limitations, and propose ideas for future research.

--------------------------------------------------------------------------------------------------------

Digital Twins in Additive Manufacturing: A Systematic Review

This review explores the current state of DTs in additive manufacturing (AM), addressing questions about their types, applications, limitations, and integration with advanced technologies. Advance AM processes by improving DT scalability, data integration, and computational power for real-time applications.

Authors: Md Manjurul Ahsan, Benjamin Bevans, Chris Billings, Alexander Riensche, Yingtao Liu, Shivakumar Raman, Zahed Siddique

Link: https://arxiv.org/abs/2409.00877v1

Date: 2024-09-02

Summary:

Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs.

--------------------------------------------------------------------------------------------------------

LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models

This work introduces LUK, a framework that leverages the knowledge of larger LLMs to empower smaller PLMs for log understanding. LUK utilizes multi-expert collaboration and specialized pre-training tasks to achieve this. Improve log analysis for system monitoring and troubleshooting in various domains.

Authors: Lipeng Ma, Weidong Yang, Sihang Jiang, Ben Fei, Mingjie Zhou, Shuhao Li, Bo Xu, Yanghua Xiao

Link: https://arxiv.org/abs/2409.01909v1

Date: 2024-09-03

Summary:

Logs play a critical role in providing essential information for system monitoring and troubleshooting. Recently, with the success of pre-trained language models (PLMs) and large language models (LLMs) in natural language processing (NLP), smaller PLMs (such as BERT) and LLMs (like ChatGPT) have become the current mainstream approaches for log analysis. While LLMs possess rich knowledge, their high computational costs and unstable performance make LLMs impractical for analyzing logs directly. In contrast, smaller PLMs can be fine-tuned for specific tasks even with limited computational resources, making them more practical. However, these smaller PLMs face challenges in understanding logs comprehensively due to their limited expert knowledge. To better utilize the knowledge embedded within LLMs for log understanding, this paper introduces a novel knowledge enhancement framework, called LUK, which acquires expert knowledge from LLMs to empower log understanding on a smaller PLM. Specifically, we design a multi-expert collaboration framework based on LLMs consisting of different roles to acquire expert knowledge. In addition, we propose two novel pre-training tasks to enhance the log pre-training with expert knowledge. LUK achieves state-of-the-art results on different log analysis tasks and extensive experiments demonstrate expert knowledge from LLMs can be utilized more effectively to understand logs.

--------------------------------------------------------------------------------------------------------

Abstractive Text Summarization: State of the Art, Challenges, and Improvements

This paper provides a comprehensive overview of the field of abstractive text summarization, covering state-of-the-art techniques, challenges, and future directions. Advance the development of abstractive summarization techniques for applications like news summarization and text analysis.

Authors: Hassan Shakil, Ahmad Farooq, Jugal Kalita

Link: https://arxiv.org/abs/2409.02413v1

Date: 2024-09-04

Summary:

Specifically focusing on the landscape of abstractive text summarization, as opposed to extractive techniques, this survey presents a comprehensive overview, delving into state-of-the-art techniques, prevailing challenges, and prospective research directions. We categorize the techniques into traditional sequence-to-sequence models, pre-trained large language models, reinforcement learning, hierarchical methods, and multi-modal summarization. Unlike prior works that did not examine complexities, scalability and comparisons of techniques in detail, this review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements - providing researchers an extensive overview to advance abstractive summarization research. We provide vital comparison tables across techniques categorized - offering insights into model complexity, scalability and appropriate applications. The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics, among others. Solutions leveraging knowledge incorporation and other innovative strategies are proposed to address these challenges. The paper concludes by highlighting emerging research areas like factual inconsistency, domain-specific, cross-lingual, multilingual, and long-document summarization, as well as handling noisy data. Our objective is to provide researchers and practitioners with a structured overview of the domain, enabling them to better understand the current landscape and identify potential areas for further research and improvement.

--------------------------------------------------------------------------------------------------------

LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning

Large language models (LLMs) have shown potential for automated planning due to their ability to reason and generate action sequences. This survey explores the current research on using LLMs for AI planning, highlighting challenges and opportunities in areas like embodied environments, optimal scheduling, and reasoning.

Authors: Haoming Li, Zhaoliang Chen, Jonathan Zhang, Fei Liu

Link: https://arxiv.org/abs/2409.01806v1

Date: 2024-09-03

Summary:

Effective planning is essential for the success of any task, from organizing a vacation to routing autonomous vehicles and developing corporate strategies. It involves setting goals, formulating plans, and allocating resources to achieve them. LLMs are particularly well-suited for automated planning due to their strong capabilities in commonsense reasoning. They can deduce a sequence of actions needed to achieve a goal from a given state and identify an effective course of action. However, it is frequently observed that plans generated through direct prompting often fail upon execution. Our survey aims to highlight the existing challenges in planning with language models, focusing on key areas such as embodied environments, optimal scheduling, competitive and cooperative games, task decomposition, reasoning, and planning. Through this study, we explore how LLMs transform AI planning and provide unique insights into the future of LM-assisted planning.

--------------------------------------------------------------------------------------------------------

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Training large language models (LLMs) to generate code effectively requires high-quality data. This paper investigates the impact of data quality on code LLM performance. It reveals limitations in existing datasets and proposes a new method for selecting good training data. The authors demonstrate that their approach leads to superior code generation compared to models trained on standard datasets. This research can improve the development of LLMs for assisting programmers with code creation and automation.

Authors: Yejie Wang, Keqing He, Dayuan Fu, Zhuoma Gongque, Heyang Xu, Yanxu Chen, Zhexu Wang, Yujia Fu, Guanting Dong, Muxi Diao, Jingang Wang, Mengdi Zhang, Xunliang Cai, Weiran Xu

Link: https://arxiv.org/abs/2409.03810v1

Date: 2024-09-05

Summary:

Recently, there has been a growing interest in studying how to construct better code instruction tuning data. However, we observe Code models trained with these datasets exhibit high performance on HumanEval but perform worse on other benchmarks such as LiveCodeBench. Upon further investigation, we find that many datasets suffer from severe data leakage. After cleaning up most of the leaked data, some well-known high-quality datasets perform poorly. This discovery reveals a new challenge: identifying which dataset genuinely qualify as high-quality code instruction data. To address this, we propose an efficient code data pruning strategy for selecting good samples. Our approach is based on three dimensions: instruction complexity, response quality, and instruction diversity. Based on our selected data, we present XCoder, a family of models finetuned from LLaMA3. Our experiments show XCoder achieves new state-of-the-art performance using fewer training data, which verify the effectiveness of our data strategy. Moreover, we perform a comprehensive analysis on the data composition and find existing code datasets have different characteristics according to their construction methods, which provide new insights for future code LLMs. Our models and dataset are released in https://github.com/banksy23/XCoder

--------------------------------------------------------------------------------------------------------

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Pretraining language models on massive datasets is crucial for their performance. However, the definition of "high-quality data" in this context remains unclear. This paper focuses on code pretraining and introduces Arctic-SnowCoder, a model trained with a focus on high-quality data. By meticulously selecting and refining training data, Arctic-SnowCoder achieves state-of-the-art performance despite using a smaller dataset than some competitors. This research sheds light on the importance of data quality for code LLMs and highlights the potential for smaller, more efficient models.

Authors: Yuxiang Wei, Hojae Han, Rajhans Samdani

Link: https://arxiv.org/abs/2409.02326v1

Date: 2024-09-03

Summary:

Recent studies have been increasingly demonstrating that high-quality data is crucial for effective pretraining of language models. However, the precise definition of "high-quality" remains underexplored. Focusing on the code domain, we introduce Arctic-SnowCoder-1.3B, a data-efficient base code model pretrained on 555B tokens through three phases of progressively refined data: (1) general pretraining with 500B standard-quality code tokens, preprocessed through basic filtering, deduplication, and decontamination, (2) continued pretraining with 50B high-quality tokens, selected from phase one by a BERT-style quality annotator trained to distinguish good code from random data, using positive examples drawn from high-quality code files, along with instruction data from Magicoder and StarCoder2-Instruct, and (3) enhanced pretraining with 5B synthetic data created by Llama-3.1-70B using phase two data as seeds, adapting the Magicoder approach for pretraining. Despite being trained on a limited dataset, Arctic-SnowCoder achieves state-of-the-art performance on BigCodeBench, a coding benchmark focusing on practical and challenging programming tasks, compared to similarly sized models trained on no more than 1T tokens, outperforming Phi-1.5-1.3B by 36%. Across all evaluated benchmarks, Arctic-SnowCoder-1.3B beats StarCoderBase-3B pretrained on 1T tokens. Additionally, it matches the performance of leading small base code models trained on trillions of tokens. For example, Arctic-SnowCoder-1.3B surpasses StarCoder2-3B, pretrained on over 3.3T tokens, on HumanEval+, a benchmark that evaluates function-level code generation, and remains competitive on BigCodeBench. Our evaluation presents a comprehensive analysis justifying various design choices for Arctic-SnowCoder. Most importantly, we find that the key to high-quality data is its alignment with the distribution of downstream applications.

--------------------------------------------------------------------------------------------------------

On Using Curved Mirrors to Decrease Shadowing in VLC

Visible light communication (VLC) offers an alternative to radio waves for data transmission indoors, but obstacles can block the signal. This paper explores using curved mirrors instead of flat ones to improve VLC. Curved mirrors provide a broader reflection, potentially eliminating signal shadows and improving connectivity even when a direct line of sight is blocked. This research could lead to more robust and reliable VLC systems for high-traffic indoor environments.

Authors: Borja Genoves Guzman, Ana Garcia Armada, Maïté Brandt-Pearce

Link: https://arxiv.org/abs/2409.03378v1

Date: 2024-09-05

Summary:

Visible light communication (VLC) complements radio frequency in indoor environments with large wireless data traffic. However, VLC is hindered by dramatic path losses when an opaque object is interposed between the transmitter and the receiver. Prior works propose the use of plane mirrors as optical reconfigurable intelligent surfaces (ORISs) to enhance communications through non-line-of-sight links. Plane mirrors rely on their orientation to forward the light to the target user location, which is challenging to implement in practice. This paper studies the potential of curved mirrors as static reflective surfaces to provide a broadening specular reflection that increases the signal coverage in mirror-assisted VLC scenarios. We study the behavior of paraboloid and semi-spherical mirrors and derive the irradiance equations. We provide extensive numerical and analytical results and show that curved mirrors, when developed with proper dimensions, may reduce the shadowing probability to zero, while static plane mirrors of the same size have shadowing probabilities larger than 65%. Furthermore, the signal-to-noise ratio offered by curved mirrors may suffice to provide connectivity to users deployed in the room even when a line-of-sight link blockage occurs.

--------------------------------------------------------------------------------------------------------

From Data to Insights: A Covariate Analysis of the IARPA BRIAR Dataset for Multimodal Biometric Recognition Algorithms at Altitude and Range

Facial recognition is often less accurate at long distances or from unusual angles. This paper analyzes a dataset of biometric recognition using drones at various distances and elevations. It investigates how factors like weather and camera resolution affect the accuracy of these systems. The findings can guide the development of more reliable biometric identification for security applications using drones or other long-range platforms.

Authors: David S. Bolme, Deniz Aykac, Ryan Shivers, Joel Brogan, Nell Barber, Bob Zhang, Laura Davies, David Cornett III

Link: https://arxiv.org/abs/2409.01514v1

Date: 2024-09-03

Summary:

This paper examines covariate effects on fused whole body biometrics performance in the IARPA BRIAR dataset, specifically focusing on UAV platforms, elevated positions, and distances up to 1000 meters. The dataset includes outdoor videos compared with indoor images and controlled gait recordings. Normalized raw fusion scores relate directly to predicted false accept rates (FAR), offering an intuitive means for interpreting model results. A linear model is developed to predict biometric algorithm scores, analyzing their performance to identify the most influential covariates on accuracy at altitude and range. Weather factors like temperature, wind speed, solar loading, and turbulence are also investigated in this analysis. The study found that resolution and camera distance best predicted accuracy and findings can guide future research and development efforts in long-range/elevated/UAV biometrics and support the creation of more reliable and robust systems for national security and other critical domains.

--------------------------------------------------------------------------------------------------------

Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift

Convolutional neural networks (CNNs) used in self-driving cars can struggle with unexpected weather conditions. This research examines how rain, fog, and other environmental changes affect the accuracy of CNNs in classifying and locating objects on the road. It also compares different methods to improve the reliability of CNNs under these conditions. This work can contribute to safer self-driving cars by improving their ability to handle diverse weather scenarios.

Authors: Fabian Diet, Moussa Kassem Sbeyti, Michelle Karg

Link: https://arxiv.org/abs/2409.03543v1

Date: 2024-09-05

Summary:

Natural distribution shift causes a deterioration in the perception performance of convolutional neural networks (CNNs). This comprehensive analysis for real-world traffic data addresses: 1) investigating the effect of natural distribution shift and weather augmentations on both detection quality and confidence estimation, 2) evaluating model performance for both classification and object localization, and 3) benchmarking two common uncertainty quantification methods - Ensembles and different variants of Monte-Carlo (MC) Dropout - under natural and close-to-natural distribution shift. For this purpose, a novel dataset has been curated from publicly available autonomous driving datasets. The in-distribution (ID) data is based on cutouts of a single object, for which both class and bounding box annotations are available. The six distribution-shift datasets cover adverse weather scenarios, simulated rain and fog, corner cases, and out-of-distribution data. A granular analysis of CNNs under distribution shift allows to quantize the impact of different types of shifts on both, task performance and confidence estimation: ConvNeXt-Tiny is more robust than EfficientNet-B0; heavy rain degrades classification stronger than localization, contrary to heavy fog; integrating MC-Dropout into selected layers only has the potential to enhance task performance and confidence estimation, whereby the identification of these layers depends on the type of distribution shift and the considered task.

--------------------------------------------------------------------------------------------------------

Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

As Artificial Intelligence (AI) advances, concerns about its potential misuse grow. This paper explores how existing international agreements on managing dangerous technologies can inform the development of AI governance frameworks. Researchers examined historical and current agreements on dual-use technologies (those with civilian and military applications) like nuclear weapons and biosecurity threats. By analyzing these agreements' purposes, structures, and enforcement mechanisms, they aim to identify key features for effective AI governance. This research can help establish international cooperation and regulations to mitigate risks associated with advanced AI.

Authors: Akash R. Wasil, Peter Barnett, Michael Gerovitch, Roman Hauksson, Tom Reed, Jack William Miller

Link: https://arxiv.org/abs/2409.02779v1

Date: 2024-09-04

Summary:

International AI governance agreements and institutions may play an important role in reducing global security risks from advanced AI. To inform the design of such agreements and institutions, we conducted case studies of historical and contemporary international security agreements. We focused specifically on those arrangements around dual-use technologies, examining agreements in nuclear security, chemical weapons, biosecurity, and export controls. For each agreement, we examined four key areas: (a) purpose, (b) core powers, (c) governance structure, and (d) instances of non-compliance. From these case studies, we extracted lessons for the design of international AI agreements and governance institutions. We discuss the importance of robust verification methods, strategies for balancing power between nations, mechanisms for adapting to rapid technological change, approaches to managing trade-offs between transparency and security, incentives for participation, and effective enforcement mechanisms.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithSeptember 9, 2024Comment