Week Ending 3.23.2025

RESEARCH WATCH: 3.23.2025

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Fine-tuning large language models (LLMs) to perform specific tasks often degrades their inherent safety features. This paper addresses this critical issue by introducing SafeMERGE, a post-fine-tuning framework that selectively merges layers to restore safety without sacrificing task performance. By analyzing deviations from safe behavior, SafeMERGE intelligently combines fine-tuned and safety-aligned model layers. This method is vital for deploying LLMs in sensitive applications where safety and task utility must coexist, such as healthcare, legal, and educational contexts, ensuring responsible AI deployment.

Authors: Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

Link: https://arxiv.org/abs/2503.17239v1

Date: 2025-03-21

Summary:

Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning- and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses.

--------------------------------------------------------------------------------------------------------

A New Segment Routing method with Swap Node Selection Strategy Based on Deep Reinforcement Learning for Software Defined Network

Traditional segment routing (SR) methods in software-defined networks (SDNs) face challenges with dynamic routing changes and flow table issuance efficiency. This paper proposes a novel SR method using deep reinforcement learning (DRL-SR) to optimize both routing and swap node selection simultaneously. By modeling traffic conditions and flow table issuance time, DRL-SR enhances network performance, reducing delays and packet losses. This approach is crucial for improving network reliability and efficiency in data centers and telecommunication networks, particularly in handling high-traffic and dynamic environments.

Authors: Miao Ye, Jihao Zheng, Qiuxiang Jiang, Yuan Huang, Ziheng Wang, Yong Wang

Link: https://arxiv.org/abs/2503.16914v1

Date: 2025-03-21

Summary:

The existing segment routing (SR) methods need to determine the routing first and then use path segmentation approaches to select swap nodes to form a segment routing path (SRP). They require re-segmentation of the path when the routing changes. Furthermore, they do not consider the flow table issuance time, which cannot maximize the speed of issuance flow table. To address these issues, this paper establishes an optimization model that can simultaneously form routing strategies and path segmentation strategies for selecting the appropriate swap nodes to reduce flow table issuance time. It also designs an intelligent segment routing algorithm based on deep reinforcement learning (DRL-SR) to solve the proposed model. First, a traffic matrix is designed as the state space for the deep reinforcement learning agent; this matrix includes multiple QoS performance indicators, flow table issuance time overhead and SR label stack depth. Second, the action selection strategy and corresponding reward function are designed, where the agent selects the next node considering the routing; in addition, the action selection strategy whether the newly added node is selected as the swap node and the corresponding reward function are designed considering the time cost factor for the controller to issue the flow table to the swap node. Finally, a series of experiments and their results show that, compared with the existing methods, the designed segmented route optimization model and the intelligent solution algorithm (DRL-SR) can reduce the time overhead required to complete the segmented route establishment task while optimizing performance metrics such as throughput, delays and packet losses.

--------------------------------------------------------------------------------------------------------

Do Visual Imaginations Improve Vision-and-Language Navigation Agents?

Vision-and-language navigation (VLN) agents rely on natural language instructions to navigate environments. This study investigates the impact of visual imaginations, generated using text-to-image diffusion models, on navigation performance. By providing visual cues of sub-goals, the agents enhance their understanding of the environment. This research is pivotal for developing more intuitive and efficient navigation systems for robots and virtual assistants, improving their ability to interpret and act on language instructions in real-world scenarios.

Authors: Akhil Perincherry, Jacob Krantz, Stefan Lee

Link: https://arxiv.org/abs/2503.16394v1

Date: 2025-03-20

Summary:

Vision-and-Language Navigation (VLN) agents are tasked with navigating an unseen environment using natural language instructions. In this work, we study if visual representations of sub-goals implied by the instructions can serve as navigational cues and lead to increased navigation performance. To synthesize these visual representations or imaginations, we leverage a text-to-image diffusion model on landmark references contained in segmented instructions. These imaginations are provided to VLN agents as an added modality to act as landmark cues and an auxiliary loss is added to explicitly encourage relating these with their corresponding referring expressions. Our findings reveal an increase in success rate (SR) of around 1 point and up to 0.5 points in success scaled by inverse path length (SPL) across agents. These results suggest that the proposed approach reinforces visual understanding compared to relying on language instructions alone. Code and data for our work can be found at https://www.akhilperincherry.com/VLN-Imagine-website/.

--------------------------------------------------------------------------------------------------------

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

Large language models (LLMs) have shown remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. This paper introduces DLCoT, a framework to enhance the distillation of long CoT reasoning into smaller, more efficient models. By segmenting, simplifying, and optimizing reasoning paths, DLCoT improves model performance and token efficiency. This research is crucial for deploying advanced reasoning capabilities in resource-constrained environments, such as mobile devices or embedded systems, where efficient processing is essential.

Authors: Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng

Link: https://arxiv.org/abs/2503.16385v1

Date: 2025-03-20

Summary:

Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities. However, the underlying mechanisms driving its effectiveness remain unclear. This study examines the universality of distillation data and identifies key components that enable the efficient transfer of long-chain reasoning capabilities in LLM distillation. Our findings reveal that the effectiveness of long CoT reasoning distillation from teacher models like Qwen-QwQ degrades significantly on nonhomologous models, challenging the assumed universality of current distillation methods. To gain deeper insights into the structure and patterns of long CoT reasoning, we propose DLCoT (Deconstructing Long Chain-of-Thought), a distillation data enhancement framework. DLCoT consists of three key steps: (1) data segmentation to decompose complex long CoT structures, (2) simplification by eliminating unsolvable and redundant solutions, and (3) optimization of intermediate error states. Our approach significantly improves model performance and token efficiency, facilitating the development of high-performance LLMs.

--------------------------------------------------------------------------------------------------------

PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming

Video streaming at low bandwidths often suffers from significant quality degradation. Promptus offers a solution by reducing bandwidth requirements, but its computational intensity hinders real-time mobile applications. This paper presents PromptMobile, an acceleration framework that optimizes Promptus for on-device processing. By employing a two-stage generation framework, fine-grained caching, and system-level optimizations, PromptMobile achieves significant speed improvements. This technology is vital for enhancing video streaming quality on mobile devices in areas with limited network connectivity.

Authors: Liming Liu, Jiangkai Wu, Haoyang Wang, Peiheng Wang, Xinggong Zhang, Zongming Guo

Link: https://arxiv.org/abs/2503.16112v1

Date: 2025-03-20

Summary:

Traditional video compression algorithms exhibit significant quality degradation at extremely low bitrates. Promptus emerges as a new paradigm for video streaming, substantially cutting down the bandwidth essential for video streaming. However, Promptus is computationally intensive and can not run in real-time on mobile devices. This paper presents PromptMobile, an efficient acceleration framework tailored for on-device Promptus. Specifically, we propose (1) a two-stage efficient generation framework to reduce computational cost by 8.1x, (2) a fine-grained inter-frame caching to reduce redundant computations by 16.6\%, (3) system-level optimizations to further enhance efficiency. The evaluations demonstrate that compared with the original Promptus, PromptMobile achieves a 13.6x increase in image generation speed. Compared with other streaming methods, PromptMobile achives an average LPIPS improvement of 0.016 (compared with H.265), reducing 60\% of severely distorted frames (compared to VQGAN).

--------------------------------------------------------------------------------------------------------

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Diffusion models have become prevalent in visual generation, and integrating Mixture of Experts (MoE) enhances model scalability. This paper introduces Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy called Expert Race. By enabling tokens and experts to dynamically compete, Race-DiT improves expert utilization and model performance. This research is essential for scaling up generative models to handle complex visual tasks, enabling high-resolution image and video generation with enhanced efficiency.

Authors: Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min

Link: https://arxiv.org/abs/2503.16057v1

Date: 2025-03-20

Summary:

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

--------------------------------------------------------------------------------------------------------

Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis

Evaluating large language models (LLMs) on long-form answers in complex domains like financial analysis requires specialized metrics. This paper proposes Extract, Match, and Score (EMS), an evaluation paradigm tailored for long question-context-answer triplets. EMS addresses the limitations of traditional metrics by focusing on the accuracy and relevance of extracted information. This approach is vital for ensuring the reliability of LLMs in critical applications where precise and detailed responses are necessary, such as regulatory compliance and financial reporting.

Authors: Bo Hu, Han Yuan, Vlad Pandelea, Wuqiong Luo, Yingzhu Zhao, Zheng Ma

Link: https://arxiv.org/abs/2503.16575v1

Date: 2025-03-20

Summary:

The rapid advancement of large language models (LLMs) has sparked widespread adoption across diverse applications, making robust evaluation frameworks crucial for assessing their performance. While conventional evaluation metrics remain applicable for shorter texts, their efficacy diminishes when evaluating the quality of long-form answers. This limitation is particularly critical in real-world scenarios involving extended questions, extensive context, and long-form answers, such as financial analysis or regulatory compliance. In this paper, we use a practical financial use case to illustrate applications that handle "long question-context-answer triplets". We construct a real-world financial dataset comprising long triplets and demonstrate the inadequacies of traditional metrics. To address this, we propose an effective Extract, Match, and Score (EMS) evaluation approach tailored to the complexities of long-form LLMs' outputs, providing practitioners with a reliable methodology for assessing LLMs' performance in complex real-world scenarios.

--------------------------------------------------------------------------------------------------------

Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

Drug discovery requires generative models that can design ligands tailored to specific biological targets. This paper introduces Chem42, a family of chemical language models that integrate target-specific insights. By combining atomic-level interactions with multimodal inputs from protein language models, Chem42 generates ligands with enhanced target specificity. This technology is crucial for accelerating the drug discovery pipeline, enabling the development of novel therapeutics with improved efficacy and reduced side effects.

Authors: Aahan Singh, Engin Tekin, Maryam Nadeem, Nancy A. ElNaker, Mohammad Amaan Sayeed, Natalia Vassilieva, Boulbaba Ben Amor

Link: https://arxiv.org/abs/2503.16563v1

Date: 2025-03-20

Summary:

Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.

--------------------------------------------------------------------------------------------------------

Cube: A Roblox View of 3D Intelligence

Building foundation models for 3D intelligence is crucial for developing immersive and interactive virtual environments. This paper presents a step towards creating such a model for Roblox, focusing on 3D shape tokenization. By enabling text-to-shape, shape-to-text, and text-to-scene generation, this research aims to empower developers in creating complex 3D content. This technology is vital for enhancing the realism and interactivity of virtual worlds, improving user engagement and creativity.

Authors: Foundation AI Team, Kiran Bhat, Nishchaie Khanna, Karun Channa, Tinghui Zhou, Yiheng Zhu, Xiaoxia Sun, Charles Shang, Anirudh Sudarshan, Maurice Chu, Daiqing Li, Kangle Deng, Jean-Philippe Fauconnier, Tijmen Verhulsdonck, Maneesh Agrawala, Kayvon Fatahalian, Alexander Weiss, Christian Reiser, Ravi Kiran Chirravuri, Ravali Kandur, Alejandro Pelaez, Akash Garg, Michael Palleschi, Jessica Wang, Skylar Litz, Leon Liu, Anying Li, David Harmon, Derek Liu, Liangjun Feng, Denis Goupil, Lukas Kuczynski, Jihyun Yoon, Naveen Marri, Peiye Zhuang, Yinan Zhang, Brian Yin, Haomiao Jiang, Marcel van Workum, Thomas Lane, Bryce Erickson, Salil Pathare, Kyle Price, Anupam Singh, David Baszucki

Link: https://arxiv.org/abs/2503.15475v1

Date: 2025-03-19

Summary:

Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities in the domains of text, images, audio and video. Our goal at Roblox is to build such a foundation model for 3D intelligence, a model that can support developers in producing all aspects of a Roblox experience, from generating 3D objects and scenes to rigging characters for animation to producing programmatic scripts describing object behaviors. We discuss three key design requirements for such a 3D foundation model and then present our first step towards building such a model. We expect that 3D geometric shapes will be a core data type and describe our solution for 3D shape tokenizer. We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation. We demonstrate how these applications can collaborate with existing large language models (LLMs) to perform scene analysis and reasoning. We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.

--------------------------------------------------------------------------------------------------------

Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning?

Large language models (LLMs) can aid in learning complex reasoning tasks, but they may exhibit biases and inconsistencies. This paper examines the Chain-of-Thought (CoT) reasoning of LLMs in Bayesian problems, revealing biases towards symbolic reasoning and avoidance of ecologically valid strategies. By identifying these limitations, researchers can develop prompts and training methods to improve the reliability and accuracy of LLMs in educational and decision-making contexts.

Authors: Roberto Araya

Link: https://arxiv.org/abs/2503.15268v1

Date: 2025-03-19

Summary:

Learning to reason and carefully explain arguments is central to students' cognitive, mathematical, and computational thinking development. This is particularly challenging in problems under uncertainty and in Bayesian reasoning. With the new generation of large language models (LLMs) capable of reasoning using Chain-of-Thought (CoT), there is an excellent opportunity to learn with them as they explain their reasoning through a dialogue with their artificial internal voice. It is an engaging and excellent opportunity to learn Bayesian reasoning. Furthermore, given that different LLMs sometimes arrive at opposite solutions, CoT generates opportunities for deep learning by detailed comparisons of reasonings. However, unlike humans, we found that they do not autonomously explain using ecologically valid strategies like natural frequencies, whole objects, and embodied heuristics. This is unfortunate, as these strategies help humans avoid critical mistakes and have proven pedagogical value in Bayesian reasoning. In order to overcome these biases and aid understanding and learning, we included prompts that induce LLMs to use these strategies. We found that LLMs with CoT incorporate them but not consistently. They show persistent biases towards symbolic reasoning and avoidance or phobia of ecologically valid strategies.

--------------------------------------------------------------------------------------------------------

Foundation models may exhibit staged progression in novel CBRN threat disclosure

Foundation models' ability to disclose novel CBRN threats is a critical concern. This paper explores this by testing models on a novel biothreat before public disclosure. The results suggest staged capability progression. Monitoring this is crucial for implementing protective measures. This research is vital for national security and public safety, ensuring that AI models are responsibly developed and deployed to mitigate potential risks.

Authors: Kevin M Esvelt

Link: https://arxiv.org/abs/2503.15182v1

Date: 2025-03-19

Summary:

The extent to which foundation models can disclose novel chemical, biological, radiation, and nuclear (CBRN) threats to expert users is unclear due to a lack of test cases. I leveraged the unique opportunity presented by an upcoming publication describing a novel catastrophic biothreat - "Technical Report on Mirror Bacteria: Feasibility and Risks" - to conduct a small controlled study before it became public. Graduate-trained biologists tasked with predicting the consequences of releasing mirror E. coli showed no significant differences in rubric-graded accuracy using Claude Sonnet 3.5 new (n=10) or web search only (n=2); both groups scored comparably to a web baseline (28 and 43 versus 36). However, Sonnet reasoned correctly when prompted by a report author, but a smaller model, Haiku 3.5, failed even with author guidance (80 versus 5). These results suggest distinct stages of model capability: Haiku is unable to reason about mirror life even with threat-aware expert guidance (Stage 1), while Sonnet correctly reasons only with threat-aware prompting (Stage 2). Continued advances may allow future models to disclose novel CBRN threats to naive experts (Stage 3) or unskilled users (Stage 4). While mirror life represents only one case study, monitoring new models' ability to reason about privately known threats may allow protective measures to be implemented before widespread disclosure.

--------------------------------------------------------------------------------------------------------

Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Explaining the decisions of reinforcement learning (RL) agents is crucial for trust and reliability. This paper proposes a framework for behavior discovery and action attribution in offline RL trajectories. By identifying meaningful behavioral segments, this method provides granular explanations. This approach is essential for deploying RL agents in safety-critical applications, such as autonomous vehicles and medical robotics, where transparency and accountability are paramount.

Authors: Rishav Rishav, Somjit Nath, Vincent Michalski, Samira Ebrahimi Kahou

Link: https://arxiv.org/abs/2503.14973v1

Date: 2025-03-19

Summary:

Explaining the decisions made by reinforcement learning (RL) agents is critical for building trust and ensuring reliability in real-world applications. Traditional approaches to explainability often rely on saliency analysis, which can be limited in providing actionable insights. Recently, there has been growing interest in attributing RL decisions to specific trajectories within a dataset. However, these methods often generalize explanations to long trajectories, potentially involving multiple distinct behaviors. Often, providing multiple more fine grained explanations would improve clarity. In this work, we propose a framework for behavior discovery and action attribution to behaviors in offline RL trajectories. Our method identifies meaningful behavioral segments, enabling more precise and granular explanations associated with high level agent behaviors. This approach is adaptable across diverse environments with minimal modifications, offering a scalable and versatile solution for behavior discovery and attribution for explainable RL.

--------------------------------------------------------------------------------------------------------

Frac-Connections: Fractional Extension of Hyper-Connections

Residual connections are vital in deep learning, but Hyper-Connections, while improving performance, increase memory costs. This paper introduces Frac-Connections, which divide hidden states to reduce memory consumption while retaining performance benefits. This research is crucial for training large-scale models, such as 7B MoE models, efficiently.

Authors: Defa Zhu, Hongzhi Huang, Jundong Zhou, Zihao Huang, Yutao Zeng, Banggu Wu, Qiyang Min, Xun Zhou

Link: https://arxiv.org/abs/2503.14125v1

Date: 2025-03-18

Summary:

Residual connections are central to modern deep learning architectures, enabling the training of very deep networks by mitigating gradient vanishing. Hyper-Connections recently generalized residual connections by introducing multiple connection strengths at different depths, thereby addressing the seesaw effect between gradient vanishing and representation collapse. However, Hyper-Connections increase memory access costs by expanding the width of hidden states. In this paper, we propose Frac-Connections, a novel approach that divides hidden states into multiple parts rather than expanding their width. Frac-Connections retain partial benefits of Hyper-Connections while reducing memory consumption. To validate their effectiveness, we conduct large-scale experiments on language tasks, with the largest being a 7B MoE model trained on up to 3T tokens, demonstrating that Frac-Connections significantly outperform residual connections.

--------------------------------------------------------------------------------------------------------

Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios

Real-world low-light images suffer from complex degradations. This paper proposes an interpretable, unsupervised framework for joint denoising and enhancement. By using paired sub-images and frequency domain decomposition, this method effectively addresses these degradations. This research is vital for enhancing image quality in surveillance, photography, and medical imaging applications.

Authors: Huaqiu Li, Xiaowan Hu, Haoqian Wang

Link: https://arxiv.org/abs/2503.14535v1

Date: 2025-03-17

Summary:

Real-world low-light images often suffer from complex degradations such as local overexposure, low brightness, noise, and uneven illumination. Supervised methods tend to overfit to specific scenarios, while unsupervised methods, though better at generalization, struggle to model these degradations due to the lack of reference images. To address this issue, we propose an interpretable, zero-reference joint denoising and low-light enhancement framework tailored for real-world scenarios. Our method derives a training strategy based on paired sub-images with varying illumination and noise levels, grounded in physical imaging principles and retinex theory. Additionally, we leverage the Discrete Cosine Transform (DCT) to perform frequency domain decomposition in the sRGB space, and introduce an implicit-guided hybrid representation strategy that effectively separates intricate compounded degradations. In the backbone network design, we develop retinal decomposition network guided by implicit degradation representation mechanisms. Extensive experiments demonstrate the superiority of our method. Code will be available at https://github.com/huaqlili/unsupervised-light-enhance-ICLR2025.

--------------------------------------------------------------------------------------------------------

Ethical Implications of AI in Data Collection: Balancing Innovation with Privacy

AI-driven data collection raises significant ethical and legal concerns. This paper examines recent advancements and regulatory approaches, emphasizing the need for adaptive governance and international cooperation. This research is crucial for ensuring that AI technologies are developed and deployed responsibly, protecting individual rights and societal values.

Authors: Shahmar Mirishli

Link: https://arxiv.org/abs/2503.14539v1

Date: 2025-03-17

Summary:

This article examines the ethical and legal implications of artificial intelligence (AI) driven data collection, focusing on developments from 2023 to 2024. It analyzes recent advancements in AI technologies and their impact on data collection practices across various sectors. The study compares regulatory approaches in the European Union, the United States, and China, highlighting the challenges in creating a globally harmonized framework for AI governance. Key ethical issues, including informed consent, algorithmic bias, and privacy protection, are critically assessed in the context of increasingly sophisticated AI systems. The research explores case studies in healthcare, finance, and smart cities to illustrate the practical challenges of AI implementation. It evaluates the effectiveness of current legal frameworks and proposes solutions encompassing legal and policy recommendations, technical safeguards, and ethical frameworks. The article emphasizes the need for adaptive governance and international cooperation to address the global nature of AI development while balancing innovation with the protection of individual rights and societal values.

--------------------------------------------------------------------------------------------------------

Halving transcription time: A fast, user-friendly and GDPR-compliant workflow to create AI-assisted transcripts for content analysis

Qualitative research often involves labor-intensive transcription. This paper presents an AI-assisted workflow that reduces transcription time by up to 46.2% while ensuring GDPR compliance. This method is vital for researchers and students, enhancing productivity and enabling efficient content analysis.

Authors: Jakob Sponholz, Andreas Weilinghoff, Juliane Schopf

Link: https://arxiv.org/abs/2503.13031v1

Date: 2025-03-17

Summary:

In qualitative research, data transcription is often labor-intensive and time-consuming. To expedite this process, a workflow utilizing artificial intelligence (AI) was developed. This workflow not only enhances transcription speed but also addresses the issue of AI-generated transcripts often lacking compatibility with standard content analysis software. Within this workflow, automatic speech recognition is employed to create initial transcripts from audio recordings, which are then formatted to be compatible with content analysis software such as ATLAS.ti or MAXQDA. Empirical data from a study of 12 interviews suggests that this workflow can reduce transcription time by up to 46.2%. Furthermore, by using widely used standard software, this process is suitable for both students and researchers while also being adaptable to a variety of learning, teaching, and research environments. It is also particularly beneficial for non-native speakers. In addition, the workflow is GDPR-compliant and facilitates local, offline transcript generation, which is crucial when dealing with sensitive data.

--------------------------------------------------------------------------------------------------------

AIMI: Leveraging Future Knowledge and Personalization in Sparse Event Forecasting for Treatment Adherence

Adherence to treatments is crucial for managing chronic conditions. This paper proposes AIMI, a knowledge-guided system that forecasts medication adherence using smartphone sensors and medication history. This research is essential for developing on-demand intervention tools, improving patient outcomes and reducing healthcare costs.

Authors: Abdullah Mamun, Diane J. Cook, Hassan Ghasemzadeh

Link: https://arxiv.org/abs/2503.16091v1

Date: 2025-03-20

Summary:

Adherence to prescribed treatments is crucial for individuals with chronic conditions to avoid costly or adverse health outcomes. For certain patient groups, intensive lifestyle interventions are vital for enhancing medication adherence. Accurate forecasting of treatment adherence can open pathways to developing an on-demand intervention tool, enabling timely and personalized support. With the increasing popularity of smartphones and wearables, it is now easier than ever to develop and deploy smart activity monitoring systems. However, effective forecasting systems for treatment adherence based on wearable sensors are still not widely available. We close this gap by proposing Adherence Forecasting and Intervention with Machine Intelligence (AIMI). AIMI is a knowledge-guided adherence forecasting system that leverages smartphone sensors and previous medication history to estimate the likelihood of forgetting to take a prescribed medication. A user study was conducted with 27 participants who took daily medications to manage their cardiovascular diseases. We designed and developed CNN and LSTM-based forecasting models with various combinations of input features and found that LSTM models can forecast medication adherence with an accuracy of 0.932 and an F-1 score of 0.936. Moreover, through a series of ablation studies involving convolutional and recurrent neural network architectures, we demonstrate that leveraging known knowledge about future and personalized training enhances the accuracy of medication adherence forecasting. Code available: https://github.com/ab9mamun/AIMI.

--------------------------------------------------------------------------------------------------------

Multimodal Feature-Driven Deep Learning for the Prediction of Duck Body Dimensions and Weight

Accurate body dimension and weight measurements are vital for poultry management. This paper introduces a deep learning model that uses multimodal data to estimate duck body dimensions and weight non-invasively. This research is crucial for optimizing livestock management, reducing animal stress, and improving economic efficiency.

Authors: Yi Xiao, Qiannan Han, Guiping Liang, Hongyan Zhang, Song Wang, Zhihao Xu, Weican Wan, Chuang Li, Guitao Jiang, Wenbo Xiao

Link: https://arxiv.org/abs/2503.14001v2

Date: 2025-03-19

Summary:

Accurate body dimension and weight measurements are critical for optimizing poultry management, health assessment, and economic efficiency. This study introduces an innovative deep learning-based model leveraging multimodal data-2D RGB images from different views, depth images, and 3D point clouds-for the non-invasive estimation of duck body dimensions and weight. A dataset of 1,023 Linwu ducks, comprising over 5,000 samples with diverse postures and conditions, was collected to support model training. The proposed method innovatively employs PointNet++ to extract key feature points from point clouds, extracts and computes corresponding 3D geometric features, and fuses them with multi-view convolutional 2D features. A Transformer encoder is then utilized to capture long-range dependencies and refine feature interactions, thereby enhancing prediction robustness. The model achieved a mean absolute percentage error (MAPE) of 6.33% and an R2 of 0.953 across eight morphometric parameters, demonstrating strong predictive capability. Unlike conventional manual measurements, the proposed model enables high-precision estimation while eliminating the necessity for physical handling, thereby reducing animal stress and broadening its application scope. This study marks the first application of deep learning techniques to poultry body dimension and weight estimation, providing a valuable reference for the intelligent and precise management of the livestock industry with far-reaching practical significance.

--------------------------------------------------------------------------------------------------------

Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability

Robotic intelligence suffers from instability in neural network models. This paper proposes Curiosity-Diffuser, a method to guide diffusion models to generate reliable trajectories. This research is essential for deploying robots in real-world applications, ensuring safety and performance.

Authors: Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu, Panfeng Huang

Link: https://arxiv.org/abs/2503.14833v1

Date: 2025-03-19

Summary:

One of the bottlenecks in robotic intelligence is the instability of neural network models, which, unlike control models, lack a well-defined convergence domain and stability. This leads to risks when applying intelligence in the physical world. Specifically, imitation policy based on neural network may generate hallucinations, leading to inaccurate behaviors that impact the safety of real-world applications. To address this issue, this paper proposes the Curiosity-Diffuser, aimed at guiding the conditional diffusion model to generate trajectories with lower curiosity, thereby improving the reliability of policy. The core idea is to use a Random Network Distillation (RND) curiosity module to assess whether the model's behavior aligns with the training data, and then minimize curiosity by classifier guidance diffusion to reduce overgeneralization during inference. Additionally, we propose a computationally efficient metric for evaluating the reliability of the policy, measuring the similarity between the generated behaviors and the training dataset, to facilitate research about reliability learning. Finally, simulation verify the effectiveness and applicability of the proposed method to a variety of scenarios, showing that Curiosity-Diffuser significantly improves task performance and produces behaviors that are more similar to the training data. The code for this work is available at: github.com/CarlDegio/Curiosity-Diffuser

--------------------------------------------------------------------------------------------------------

The Impact of Artificial Intelligence on Emergency Medicine: A Review of Recent Advances

AI is revolutionizing emergency medicine by enhancing diagnostic processes. This paper reviews recent advances in AI applications for emergency imaging, highlighting its potential to improve patient outcomes. This research is crucial for integrating AI into clinical practice, enhancing the speed and accuracy of medical diagnoses.

Authors: Gustavo Correia, Victor Alves, Paulo Novais

Link: https://arxiv.org/abs/2503.14546v1

Date: 2025-03-17

Summary:

Artificial Intelligence (AI) is revolutionizing emergency medicine by enhancing diagnostic processes and improving patient outcomes. This article provides a review of the current applications of AI in emergency imaging studies, focusing on the last five years of advancements. AI technologies, particularly machine learning and deep learning, are pivotal in interpreting complex imaging data, offering rapid, accurate diagnoses and potentially surpassing traditional diagnostic methods. Studies highlighted within the article demonstrate AI's capabilities in accurately detecting conditions such as fractures, pneumothorax, and pulmonary diseases from various imaging modalities including X-rays, CT scans, and MRIs. Furthermore, AI's ability to predict clinical outcomes like mechanical ventilation needs illustrates its potential in crisis resource optimization. Despite these advancements, the integration of AI into clinical practice presents challenges such as data privacy, algorithmic bias, and the need for extensive validation across diverse settings. This review underscores the transformative potential of AI in emergency settings, advocating for a future where AI and clinical expertise synergize to elevate patient care standards.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithMarch 24, 2025Comment