Week Ending 2.2.2025

 

RESEARCH WATCH: 2.2.2025

 

GPO-VAE: Modeling Explainable Gene Perturbation Responses utilizing GRN-Aligned Parameter Optimization

In the complex world of biological research, understanding cellular responses to genetic perturbations is crucial for developing targeted therapies. This paper introduces GPO-VAE, an innovative variational autoencoder that brings much-needed explainability to genetic modeling. By incorporating gene regulatory networks (GRNs) into the model's latent space, researchers can now better interpret how genetic changes impact cellular behavior. The approach promises to bridge the gap between complex machine learning techniques and biological understanding, potentially revolutionizing personalized medicine, drug development, and our comprehension of genetic interactions.

Authors:  Seungheun Baek, Soyon Park, Yan Ting Chok, Mogan Gim, Jaewoo Kang

Link:  https://arxiv.org/abs/2501.18973v1

Date: 2025-01-31

Summary:

Motivation: Predicting cellular responses to genetic perturbations is essential for understanding biological systems and developing targeted therapeutic strategies. While variational autoencoders (VAEs) have shown promise in modeling perturbation responses, their limited explainability poses a significant challenge, as the learned features often lack clear biological meaning. Nevertheless, model explainability is one of the most important aspects in the realm of biological AI. One of the most effective ways to achieve explainability is incorporating the concept of gene regulatory networks (GRNs) in designing deep learning models such as VAEs. GRNs elicit the underlying causal relationships between genes and are capable of explaining the transcriptional responses caused by genetic perturbation treatments. Results: We propose GPO-VAE, an explainable VAE enhanced by GRN-aligned Parameter Optimization that explicitly models gene regulatory networks in the latent space. Our key approach is to optimize the learnable parameters related to latent perturbation effects towards GRN-aligned explainability. Experimental results on perturbation prediction show our model achieves state-of-the-art performance in predicting transcriptional responses across multiple benchmark datasets. Furthermore, additional results on evaluating the GRN inference task reveal our model's ability to generate meaningful GRNs compared to other methods. According to qualitative analysis, GPO-VAE posseses the ability to construct biologically explainable GRNs that align with experimentally validated regulatory pathways. GPO-VAE is available at https://github.com/dmis-lab/GPO-VAE

--------------------------------------------------------------------------------------------------------

Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

As artificial intelligence systems become increasingly complex, understanding their inner workings has become a critical challenge. This paper proposes a groundbreaking approach to unifying different methods of attributing model behavior, focusing on input features, training data, and internal model components. By demonstrating the fundamental similarities across these attribution techniques, the researchers aim to create a more accessible and coherent landscape of interpretability research. Their work could significantly advance our ability to understand, debug, and improve AI systems, making them more transparent and trustworthy across various domains.

Authors:  Shichang Zhang, Tessa Han, Usha Bhalla, Hima Lakkaraju

Link:  https://arxiv.org/abs/2501.18887v1

Date: 2025-01-31

Summary:

The increasing complexity of AI systems has made understanding their behavior a critical challenge. Numerous methods have been developed to attribute model behavior to three key aspects: input features, training data, and internal model components. However, these attribution methods are studied and applied rather independently, resulting in a fragmented landscape of approaches and terminology. This position paper argues that feature, data, and component attribution methods share fundamental similarities, and bridging them can benefit interpretability research. We conduct a detailed analysis of successful methods across three domains and present a unified view to demonstrate that these seemingly distinct methods employ similar approaches, such as perturbations, gradients, and linear approximations, differing primarily in their perspectives rather than core techniques. Our unified perspective enhances understanding of existing attribution methods, identifies shared concepts and challenges, makes this field more accessible to newcomers, and highlights new directions not only for attribution and interpretability but also for broader AI research, including model editing, steering, and regulation.

--------------------------------------------------------------------------------------------------------

Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential

Medical imaging privacy and research integrity are at the heart of this study examining the effectiveness of MRI image defacing techniques. As researchers seek to protect patient identities, the study reveals surprising vulnerabilities in current defacing methods. Using advanced diffusion probabilistic models, the researchers demonstrated the potential to reconstruct faces from defaced images. Moreover, they found that defacing might not only fail to protect privacy but could also eliminate valuable research information. This research has critical implications for medical data sharing, privacy protection, and the preservation of research potential.

Authors:  Chenyu Gao, Kaiwen Xu, Michael E. Kim, Lianrui Zuo, Zhiyuan Li, Derek B. Archer, Timothy J. Hohman, Ann Zenobia Moore, Luigi Ferrucci, Lori L. Beason-Held, Susan M. Resnick, Christos Davatzikos, Jerry L. Prince, Bennett A. Landman

Link:  https://arxiv.org/abs/2501.18834v1

Date: 2025-01-31

Summary:

Defacing is often applied to head magnetic resonance image (MRI) datasets prior to public release to address privacy concerns. The alteration of facial and nearby voxels has provoked discussions about the true capability of these techniques to ensure privacy as well as their impact on downstream tasks. With advancements in deep generative models, the extent to which defacing can protect privacy is uncertain. Additionally, while the altered voxels are known to contain valuable anatomical information, their potential to support research beyond the anatomical regions directly affected by defacing remains uncertain. To evaluate these considerations, we develop a refacing pipeline that recovers faces in defaced head MRIs using cascaded diffusion probabilistic models (DPMs). The DPMs are trained on images from 180 subjects and tested on images from 484 unseen subjects, 469 of whom are from a different dataset. To assess whether the altered voxels in defacing contain universally useful information, we also predict computed tomography (CT)-derived skeletal muscle radiodensity from facial voxels in both defaced and original MRIs. The results show that DPMs can generate high-fidelity faces that resemble the original faces from defaced images, with surface distances to the original faces significantly smaller than those of a population average face (p < 0.05). This performance also generalizes well to previously unseen datasets. For skeletal muscle radiodensity predictions, using defaced images results in significantly weaker Spearman's rank correlation coefficients compared to using original images (p < 10-4). For shin muscle, the correlation is statistically significant (p < 0.05) when using original images but not statistically significant (p > 0.05) when any defacing method is applied, suggesting that defacing might not only fail to protect privacy but also eliminate valuable information.

--------------------------------------------------------------------------------------------------------

R.I.P.: Better Models by Survival of the Fittest Prompts

Training data quality is a fundamental driver of model performance, and this paper introduces an innovative method for evaluating and improving training datasets. The Rejecting Instruction Preferences (RIP) approach focuses on identifying and filtering low-quality prompts by measuring the variance and quality of rejected responses. By applying this method to existing training sets, researchers demonstrated significant performance improvements across various benchmarks. This technique could revolutionize how we prepare and curate training data for language models, potentially leading to more reliable, accurate, and consistent AI systems.

Authors:  Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar Sukhbaatar, Jason Weston, Jing Xu

Link:  https://arxiv.org/abs/2501.18578v1

Date: 2025-01-30

Summary:

Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, which is from 18th place to 6th overall in the leaderboard.

--------------------------------------------------------------------------------------------------------

ASAP: Learning Generalizable Online Bin Packing via Adaptive Selection After Pruning

Solving complex optimization problems like 3D bin packing has been challenging for deep reinforcement learning approaches due to generalization issues. This paper introduces ASAP, a novel method that decomposes decision-making into pruning and selection policies. By using meta-learning and a unique training scheme, ASAP demonstrates excellent capabilities in adapting to different bin packing scenarios. The research has potential applications in logistics, supply chain management, warehouse optimization, and automated packaging systems, offering a more flexible and adaptable approach to solving complex spatial arrangement problems.

Authors:  Han Fang, Paul Weng, Yutong Ban

Link:  https://arxiv.org/abs/2501.17377v1

Date: 2025-01-29

Summary:

Recently, deep reinforcement learning (DRL) has achieved promising results in solving online 3D Bin Packing Problems (3D-BPP). However, these DRL-based policies may perform poorly on new instances due to distribution shift. Besides generalization, we also consider adaptation, completely overlooked by previous work, which aims at rapidly finetuning these policies to a new test distribution. To tackle both generalization and adaptation issues, we propose Adaptive Selection After Pruning (ASAP), which decomposes a solver's decision-making into two policies, one for pruning and one for selection. The role of the pruning policy is to remove inherently bad actions, which allows the selection policy to choose among the remaining most valuable actions. To learn these policies, we propose a training scheme based on a meta-learning phase of both policies followed by a finetuning phase of the sole selection policy to rapidly adapt it to a test distribution. Our experiments demonstrate that ASAP exhibits excellent generalization and adaptation capabilities on in-distribution and out-of-distribution instances under both discrete and continuous setup.

--------------------------------------------------------------------------------------------------------

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

As language models become more powerful, the ability to fine-tune and control their outputs becomes increasingly important. This paper introduces AxBench, a comprehensive benchmark for evaluating various techniques to steer language model responses. By comparing methods like prompting, finetuning, and representation-based techniques, the researchers provide insights into the most effective approaches for concept detection and output control. Their work is crucial for improving AI safety, reliability, and interpretability, with potential applications in content moderation, personalized AI assistants, and ethical AI development.

Authors:  Zhengxuan Wu, Aryaman Arora, Atticus Geiger, Zheng Wang, Jing Huang, Dan Jurafsky, Christopher D. Manning, Christopher Potts

Link:  https://arxiv.org/abs/2501.17148v2

Date: 2025-01-29

Summary:

Fine-grained steering of language model outputs is essential for safety and reliability. Prompting and finetuning are widely used to achieve these goals, but interpretability researchers have proposed a variety of representation-based techniques as well, including sparse autoencoders (SAEs), linear artificial tomography, supervised steering vectors, linear probes, and representation finetuning. At present, there is no benchmark for making direct comparisons between these proposals. Therefore, we introduce AxBench, a large-scale benchmark for steering and concept detection, and report experiments on Gemma-2-2B and 9B. For steering, we find that prompting outperforms all existing methods, followed by finetuning. For concept detection, representation-based methods such as difference-in-means, perform the best. On both evaluations, SAEs are not competitive. We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. Along with AxBench, we train and publicly release SAE-scale feature dictionaries for ReFT-r1 and DiffMean.

--------------------------------------------------------------------------------------------------------

Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share

In the emerging field of quantum computing, researchers are exploring innovative methods to improve optimization problem-solving. This preliminary study investigates the potential of reverse annealing as a mechanism for knowledge transfer between similar problems. By focusing on the well-known Knapsack Problem, the researchers aim to understand how good states found in one problem might help refine solutions in related challenges. This research could be pivotal in developing more efficient quantum computing strategies, with implications for complex optimization tasks in fields like logistics, finance, and machine learning.

Authors:  Eneko Osaba, Esther Villar-Rodriguez

Link:  https://arxiv.org/abs/2501.15865v1

Date: 2025-01-27

Summary:

Being immersed in the NISQ-era, current quantum annealers present limitations for solving optimization problems efficiently. To mitigate these limitations, D-Wave Systems developed a mechanism called Reverse Annealing, a specific type of quantum annealing designed to perform local refinement of good states found elsewhere. Despite the research activity around Reverse Annealing, none has theorized about the possible benefits related to the transfer of knowledge under this paradigm. This work moves in that direction and is driven by experimentation focused on answering two key research questions: i) is reverse annealing a paradigm that can benefit from knowledge transfer between similar problems? and ii) can we infer the characteristics that an input solution should meet to help increase the probability of success? To properly guide the tests in this paper, the well-known Knapsack Problem has been chosen for benchmarking purposes, using a total of 34 instances composed of 14 and 16 items.

--------------------------------------------------------------------------------------------------------

s1: Simple test-time scaling

Test-time scaling represents a promising approach to improving language model performance by using additional computational resources during inference. This paper introduces a simple yet effective method for enhancing reasoning capabilities, focusing on a small, curated dataset and a novel "budget forcing" technique. By controlling the model's thinking process and encouraging self-checking, the researchers achieved significant improvements in mathematical reasoning performance. This work could be transformative for AI systems requiring complex reasoning, such as scientific research, educational tools, and advanced problem-solving applications.

Authors:  Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto

Link:  https://arxiv.org/abs/2501.19393v1

Date: 2025-01-31

Summary:

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1 exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1 with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1.

--------------------------------------------------------------------------------------------------------

Understanding Long Videos via LLM-Powered Entity Relation Graphs

Video understanding is a complex challenge in artificial intelligence, particularly for extended footage with dynamic object interactions. This paper presents GraphVideoAgent, an innovative system that uses graph-based object tracking and large language models to improve video analysis. By creating dynamic entity relationship graphs, the approach enables more nuanced tracking of objects across time, significantly improving frame selection and contextual understanding. Potential applications include advanced surveillance systems, content recommendation, automated video summarization, and assistive technologies for visually impaired individuals.

Authors:  Meng Chu, Yicong Li, Tat-Seng Chua

Link:  https://arxiv.org/abs/2501.15953v1

Date: 2025-01-27

Summary:

The analysis of extended video content poses unique challenges in artificial intelligence, particularly when dealing with the complexity of tracking and understanding visual elements across time. Current methodologies that process video frames sequentially struggle to maintain coherent tracking of objects, especially when these objects temporarily vanish and later reappear in the footage. A critical limitation of these approaches is their inability to effectively identify crucial moments in the video, largely due to their limited grasp of temporal relationships. To overcome these obstacles, we present GraphVideoAgent, a cutting-edge system that leverages the power of graph-based object tracking in conjunction with large language model capabilities. At its core, our framework employs a dynamic graph structure that maps and monitors the evolving relationships between visual entities throughout the video sequence. This innovative approach enables more nuanced understanding of how objects interact and transform over time, facilitating improved frame selection through comprehensive contextual awareness. Our approach demonstrates remarkable effectiveness when tested against industry benchmarks. In evaluations on the EgoSchema dataset, GraphVideoAgent achieved a 2.2 improvement over existing methods while requiring analysis of only 8.2 frames on average. Similarly, testing on the NExT-QA benchmark yielded a 2.0 performance increase with an average frame requirement of 8.1. These results underscore the efficiency of our graph-guided methodology in enhancing both accuracy and computational performance in long-form video understanding tasks.

--------------------------------------------------------------------------------------------------------

Limits to AI Growth: The Ecological and Social Consequences of Scaling

As artificial intelligence continues to advance rapidly, this paper provides a critical examination of the ecological, economic, and social consequences of AI scaling. By applying system dynamics concepts, the researchers explore the complex relationships between technological development and its broader impacts. The study highlights how current AI scaling approaches may externalize environmental and social damages while benefiting large tech companies. This research is crucial for promoting more sustainable and responsible AI development, encouraging a more holistic approach to technological innovation.

Authors:  Eshta Bhardwaj, Rohan Alexander, Christoph Becker

Link:  https://arxiv.org/abs/2501.17980v1

Date: 2025-01-29

Summary:

The accelerating development and deployment of AI technologies depend on the continued ability to scale their infrastructure. This has implied increasing amounts of monetary investment and natural resources. Frontier AI applications have thus resulted in rising financial, environmental, and social costs. While the factors that AI scaling depends on reach its limits, the push for its accelerated advancement and entrenchment continues. In this paper, we provide a holistic review of AI scaling using four lenses (technical, economic, ecological, and social) and review the relationships between these lenses to explore the dynamics of AI growth. We do so by drawing on system dynamics concepts including archetypes such as "limits to growth" to model the dynamic complexity of AI scaling and synthesize several perspectives. Our work maps out the entangled relationships between the technical, economic, ecological and social perspectives and the apparent limits to growth. The analysis explains how industry's responses to external limits enables continued (but temporary) scaling and how this benefits Big Tech while externalizing social and environmental damages. To avoid an "overshoot and collapse" trajectory, we advocate for realigning priorities and norms around scaling to prioritize sustainable and mindful advancements.

--------------------------------------------------------------------------------------------------------

Enhancing Soft Skills in Network Management Education: A Study on the Impact of GenAI-based Virtual Assistants

In the rapidly evolving landscape of technology education, this paper explores the innovative integration of generative AI-based virtual assistants in network management courses. Recognizing the growing importance of soft skills like critical thinking and problem-solving, the researchers aim to assess how AI tools can enhance student learning experiences. By leveraging advanced artificial intelligence, the study seeks to bridge the gap between technical knowledge and essential interpersonal skills. This approach could revolutionize educational methodologies, providing students with more interactive, personalized learning experiences that develop crucial skills for the digital workplace.

Authors:  Dimitris Pantazatos, Mary Grammatikou, Vasilis Maglaris

Link:  https://arxiv.org/abs/2501.16901v1

Date: 2025-01-28

Summary:

The rapid evolution of technology in educational settings has opened new avenues for enhancing learning experiences, particularly in specialized fields like network management. This paper explores the novel integration of a GenAI-based virtual assistant in a university-level network management course, focusing on its impact on developing students' soft skills, notably critical thinking and problem-solving abilities. Recognizing the increasing importance of these skills in the digital age, our study aims to assess the empirical effectiveness of this artificial intelligence-driven educational tool in fostering these competencies among students.

--------------------------------------------------------------------------------------------------------

The TIP of the Iceberg: Revealing a Hidden Class of Task-In-Prompt Adversarial Attacks on LLMs

Cybersecurity in the age of large language models faces new and sophisticated challenges. This research unveils a novel class of adversarial attacks called Task-in-Prompt (TIP) attacks, which exploit LLM vulnerabilities by embedding complex tasks within prompts to generate prohibited content. By introducing the PHRYGE benchmark, the researchers systematically demonstrate these attacks' effectiveness across six state-of-the-art language models. The study highlights critical weaknesses in current AI safety alignments, underscoring the urgent need for more robust defense strategies. This research is crucial for developing more secure and trustworthy AI systems across various applications.

Authors:  Sergey Berezin, Reza Farahbakhsh, Noel Crespi

Link:  https://arxiv.org/abs/2501.18626v1

Date: 2025-01-27

Summary:

We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt (TIP) attacks. Our approach embeds sequence-to-sequence tasks (e.g., cipher decoding, riddles, code execution) into the model's prompt to indirectly generate prohibited inputs. To systematically assess the effectiveness of these attacks, we introduce the PHRYGE benchmark. We demonstrate that our techniques successfully circumvent safeguards in six state-of-the-art language models, including GPT-4o and LLaMA 3.2. Our findings highlight critical weaknesses in current LLM safety alignments and underscore the urgent need for more sophisticated defence strategies.   Warning: this paper contains examples of unethical inquiries used solely for research purposes.

--------------------------------------------------------------------------------------------------------

Efficient Reasoning with Hidden Thinking

Chain-of-Thought reasoning has transformed complex problem-solving in multimodal large language models, but its verbosity introduces significant inefficiencies. The Heima framework offers an innovative solution by condensing reasoning processes into compact, hidden-space representations. By using a single thinking token and developing specialized encoder and decoder mechanisms, the approach maintains reasoning accuracy while dramatically reducing computational overhead. This research could be transformative for AI systems requiring complex reasoning, potentially improving performance in fields like scientific research, educational technologies, and advanced problem-solving applications that demand both efficiency and interpretability.

Authors:  Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu

Link:  https://arxiv.org/abs/2501.19201v1

Date: 2025-01-31

Summary:

Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual reasoning introduces significant inefficiencies. In this work, we propose Heima (as hidden llama), an efficient reasoning framework that leverages reasoning CoTs at hidden latent space. We design the Heima Encoder to condense each intermediate CoT into a compact, higher-level hidden representation using a single thinking token, effectively minimizing verbosity and reducing the overall number of tokens required during the reasoning process. Meanwhile, we design corresponding Heima Decoder with traditional Large Language Models (LLMs) to adaptively interpret the hidden representations into variable-length textual sequence, reconstructing reasoning processes that closely resemble the original CoTs. Experimental results across diverse reasoning MLLM benchmarks demonstrate that Heima model achieves higher generation efficiency while maintaining or even better zero-shot task accuracy. Moreover, the effective reconstruction of multimodal reasoning processes with Heima Decoder validates both the robustness and interpretability of our approach.

--------------------------------------------------------------------------------------------------------

From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors

Understanding public attitudes towards artificial intelligence is crucial as the technology becomes increasingly prevalent. This comprehensive study collected over 12,000 responses from a nationally representative U.S. sample, using metaphor analysis to uncover how people conceptualize AI. By employing advanced language modeling techniques, the researchers mapped perceptions of AI's human-likeness, warmth, and competence. The study reveals fascinating insights into how different demographic groups perceive AI, with significant implications for technology development, trust-building, and inclusive AI design. This research provides a critical lens for understanding and addressing societal concerns about emerging technologies.

Authors:  Myra Cheng, Angela Y. Lee, Kristina Rapuano, Kate Niederhoffer, Alex Liebscher, Jeffrey Hancock

Link:  https://arxiv.org/abs/2501.18045v1

Date: 2025-01-29

Summary:

How has the public responded to the increasing prevalence of artificial intelligence (AI)-based technologies? We investigate public perceptions of AI by collecting over 12,000 responses over 12 months from a nationally representative U.S. sample. Participants provided open-ended metaphors reflecting their mental models of AI, a methodology that overcomes the limitations of traditional self-reported measures. Using a mixed-methods approach combining quantitative clustering and qualitative coding, we identify 20 dominant metaphors shaping public understanding of AI. To analyze these metaphors systematically, we present a scalable framework integrating language modeling (LM)-based techniques to measure key dimensions of public perception: anthropomorphism (attribution of human-like qualities), warmth, and competence. We find that Americans generally view AI as warm and competent, and that over the past year, perceptions of AI's human-likeness and warmth have significantly increased (+34\%, r = 0.80, p < 0.01; +41\%, r = 0.62, p < 0.05). Furthermore, these implicit perceptions, along with the identified dominant metaphors, strongly predict trust in and willingness to adopt AI (r^2 = 0.21, 0.18, p < 0.001). We further explore how differences in metaphors and implicit perceptions--such as the higher propensity of women, older individuals, and people of color to anthropomorphize AI--shed light on demographic disparities in trust and adoption. In addition to our dataset and framework for tracking evolving public attitudes, we provide actionable insights on using metaphors for inclusive and responsible AI development.

--------------------------------------------------------------------------------------------------------

BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights

Text-to-speech (TTS) technology faces unique challenges with complex languages like Taiwanese Mandarin, particularly in polyphone disambiguation. The BreezyVoice system addresses these challenges by integrating advanced technologies including a large language model, optimal-transport conditional flow matching, and sophisticated phonetic prediction. By focusing on realistic speech generation and code-switching contexts, the research offers significant improvements in neural codec TTS systems. This work has profound implications for language preservation, assistive technologies, and creating more natural human-computer interactions, especially for languages with complex phonetic structures.

Authors:  Chan-Jan Hsu, Yi-Cheng Lin, Chia-Chun Lin, Wei-Chih Chen, Ho Lam Chung, Chen-An Li, Yi-Chang Chen, Chien-Yu Yu, Ming-Ji Lee, Chien-Cheng Chen, Ru-Heng Huang, Hung-yi Lee, Da-Shan Shiu

Link:  https://arxiv.org/abs/2501.17790v1

Date: 2025-01-29

Summary:

We present BreezyVoice, a Text-to-Speech (TTS) system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities to address the unique challenges of polyphone disambiguation in the language. Building upon CosyVoice, we incorporate a S3 tokenizer, a large language model (LLM), an optimal-transport conditional flow matching model (OT-CFM), and a grapheme to phoneme prediction model, to generate realistic speech that closely mimics human utterances. Our evaluation demonstrates BreezyVoice's superior performance in both general and code-switching contexts, highlighting its robustness and effectiveness in generating high-fidelity speech. Additionally, we address the challenges of generalizability in modeling long-tail speakers and polyphone disambiguation. Our approach significantly enhances performance and offers valuable insights into the workings of neural codec TTS systems.

--------------------------------------------------------------------------------------------------------

Impact and influence of modern AI in metadata management

Metadata management is evolving rapidly with the integration of artificial intelligence technologies. This paper provides a comprehensive examination of how AI transforms traditional approaches to organizing, classifying, and utilizing data resources. By comparing traditional and AI-driven metadata methods, the researchers propose an innovative framework that automates metadata generation and enhances data governance. The study highlights the potential for AI to improve dataset accessibility, usability, and management across various domains. This research is crucial for organizations seeking to leverage advanced technologies to extract maximum value from their data resources.

Authors:  Wenli Yang, Rui Fu, Muhammad Bilal Amin, Byeong Kang

Link:  https://arxiv.org/abs/2501.16605v1

Date: 2025-01-28

Summary:

Metadata management plays a critical role in data governance, resource discovery, and decision-making in the data-driven era. While traditional metadata approaches have primarily focused on organization, classification, and resource reuse, the integration of modern artificial intelligence (AI) technologies has significantly transformed these processes. This paper investigates both traditional and AI-driven metadata approaches by examining open-source solutions, commercial tools, and research initiatives. A comparative analysis of traditional and AI-driven metadata management methods is provided, highlighting existing challenges and their impact on next-generation datasets. The paper also presents an innovative AI-assisted metadata management framework designed to address these challenges. This framework leverages more advanced modern AI technologies to automate metadata generation, enhance governance, and improve the accessibility and usability of modern datasets. Finally, the paper outlines future directions for research and development, proposing opportunities to further advance metadata management in the context of AI-driven innovation and complex datasets.

--------------------------------------------------------------------------------------------------------

The Core of Approval-Based Committee Elections with Few Candidates

Mathematical voting theory receives a sophisticated treatment in this research exploring committee election methods. By investigating approval-based elections where voters can approve multiple candidates, the study provides mathematical proof about the existence of stable committee selections. Using computational techniques and linear programming, the researchers demonstrated how proportional representation can be achieved in committee selection. This work has potential applications in democratic decision-making, organizational governance, and developing more nuanced voting systems that better represent diverse voter preferences.

Authors:  Dominik Peters

Link:  https://arxiv.org/abs/2501.18304v1

Date: 2025-01-30

Summary:


In an approval-based committee election, the goal is to select a committee consisting of k out of m candidates, based on n voters who each approve an arbitrary number of the candidates. The core of such an election consists of all committees that satisfy a certain stability property which implies proportional representation. In particular, committees in the core cannot be "objected to" by a coalition of voters who is underrepresented. The notion of the core was proposed in 2016, but it has remained an open problem whether it is always non-empty. We prove that core committees always exist when k≤8, for any number of candidates m and any number of voters n, by showing that the Proportional Approval Voting (PAV) rule due to Thiele [1895] always satisfies the core when k≤7 and always selects at least one committee in the core when k=8. We also develop an artificial rule based on recursive application of PAV, and use it to show that the core is non-empty whenever there are m≤15 candidates, for any committee size k≤m and any number of voters n. These results are obtained with the help of computer search using linear programs.

--------------------------------------------------------------------------------------------------------

Fake News Detection After LLM Laundering: Measurement and Explanation

As large language models become more sophisticated, their potential for generating convincing misinformation becomes a critical concern. This research investigates the challenges of detecting fake news that has been paraphrased by AI systems. By examining how different models perform at generating and detecting paraphrased content, the study reveals significant vulnerabilities in current fake news detection methods. The researchers discovered that sentiment shifts can undermine detection efforts, providing crucial insights into the evolving landscape of digital misinformation. This work is essential for developing more robust fact-checking and content verification technologies.

Authors:  Rupak Kumar Das, Jonathan Dodge

Link:  https://arxiv.org/abs/2501.18649v1

Date: 2025-01-29

Summary:

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

--------------------------------------------------------------------------------------------------------

RIS Assisted Wireless Communication: Advanced Modeling, Simulation, and Analytical Insights

Reconfigurable intelligent surfaces (RIS) represent a cutting-edge approach to enhancing wireless communication systems. This study provides a comprehensive framework for modeling and simulating RIS-assisted communication, addressing limitations in traditional antenna design and communication system modeling. By developing a holistic simulation approach that captures signal generation, propagation, and reception, the researchers offer valuable insights into system performance. The work has significant implications for improving wireless communication technologies, with potential applications in 5G networks, internet of things (IoT) devices, and advanced telecommunications infrastructure.

Authors:  Xiaocun Zong, Fan Yang, Zhijun Zhang, Shenheng Xu, Maokun Li

Link:  https://arxiv.org/abs/2501.15917v1

Date: 2025-01-27

Summary:

This article presents a novel perspective to model and simulate reconfigurable intelligent surface (RIS)-assisted communication systems. Traditional methods in antenna design often rely on array method to simulate, whereas communication system modeling tends to idealize antenna behavior. Neither approach sufficiently captures the detailed characteristics of RIS-assisted communication. To address this limitation, we propose a comprehensive simulation framework that jointly models RIS antenna design and the communication process. This framework simulates the entire communication pipeline, encompassing signal generation, modulation, propagation, RIS-based radiation, signal reception, alignment, demodulation, decision, and processing. Using a QPSK-modulated signal for validation, we analyze system performance and investigate the relationship between bit error rate (BER), aperture fill time, array size, and baseband symbol frequency. The results indicate that larger array sizes and higher baseband symbol frequencies exacerbate aperture fill time effects, leading to increased BER. Furthermore, we examine BER variation with respect to signal-to-noise ratio (SNR) and propose an optimal matching-based alignment algorithm, which significantly reduces BER compared to conventional pilot-based alignment methods. This work demonstrates the entire process of RIS communication, and reveals the source of bit errors, which provides valuable insights into the design and performance optimization of RIS-assisted communication systems.

--------------------------------------------------------------------------------------------------------

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Medical artificial intelligence receives a significant boost with the introduction of MedXpertQA, a comprehensive benchmark for evaluating advanced medical reasoning. Spanning 17 medical specialties and 11 body systems, this benchmark includes both text and multimodal evaluation sets with expert-level exam questions. By incorporating rigorous data filtering, expert reviews, and diverse clinical scenarios, the research aims to push the boundaries of AI's medical understanding. This work is crucial for developing more sophisticated medical AI systems that can support healthcare professionals, potentially revolutionizing medical diagnosis, education, and decision-making processes.

Authors:  Yuxin Zuo, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu, Ermo Hua, Kaiyan Zhang, Ning Ding, Bowen Zhou

Link:  https://arxiv.org/abs/2501.18362v1

Date: 2025-01-30

Summary:

We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 questions spanning 17 specialties and 11 body systems. It includes two subsets, Text for text evaluation and MM for multimodal evaluation. Notably, MM introduces expert-level exam questions with diverse images and rich clinical information, including patient records and examination results, setting it apart from traditional medical multimodal benchmarks with simple QA pairs generated from image captions. MedXpertQA applies rigorous filtering and augmentation to address the insufficient difficulty of existing benchmarks like MedQA, and incorporates specialty board questions to improve clinical relevance and comprehensiveness. We perform data synthesis to mitigate data leakage risk and conduct multiple rounds of expert reviews to ensure accuracy and reliability. We evaluate 16 leading models on MedXpertQA. Moreover, medicine is deeply connected to real-world decision-making, providing a rich and representative setting for assessing reasoning abilities beyond mathematics and code. To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.