Week Ending 7.28.2024


Lessons from Learning to Spin "Pens"

This paper explores in-hand manipulation of pen-like objects, a crucial skill for using many everyday tools. Current learning-based methods struggle with this task due to limited high-quality demonstrations and simulation-to-real-world gaps. The researchers use reinforcement learning to train an oracle policy in simulation, generating a dataset for pre-training and real-world trajectory replay. They then fine-tune the policy using real-world data. With fewer than 50 trajectories, their approach can rotate various pen-like objects for multiple revolutions. This work could improve robotic manipulation capabilities for tasks involving tools like screwdrivers or hammers in manufacturing, maintenance, or household robotics applications.

Authors:  Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

Link:  https://arxiv.org/abs/2407.18902v1

Date: 2024-07-26


AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

This study addresses the challenge of 3D hand reconstruction in unconstrained environments, which is hindered by a lack of diverse in-the-wild datasets. The researchers propose AttentionHand, a method for generating controllable hand images guided by text prompts. By creating numerous varied hand images aligned with 3D labels, they build a new dataset to bridge the gap between indoor and outdoor scenes. The approach uses four input modalities and employs text and visual attention stages in a diffusion-based pipeline. This technology could enhance hand tracking in augmented reality, sign language recognition, or gesture-based interfaces for various applications.

Authors:  Junho Park, Kyeongbo Kong, Suk-Ju Kang

Link:  https://arxiv.org/abs/2407.18034v1

Date: 2024-07-25


OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos

This paper introduces OVR, a large-scale dataset for annotating temporal repetitions in videos. Containing over 72,000 annotated videos from Kinetics and Ego4D, it covers both third-person and first-person viewpoints across diverse actions. The dataset includes repetition counts, start/end times, and free-form descriptions of repeating elements. The researchers also propose a baseline transformer model, OVRCounter, for localizing and counting repetitions in videos up to 320 frames long. This dataset and model could benefit action recognition, sports analysis, exercise tracking, and automated video indexing applications, enabling more nuanced understanding of repetitive actions in diverse video content.

Authors:  Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Andrew Zisserman

Link:  https://arxiv.org/abs/2407.17085v1

Date: 2024-07-24


Patched RTC: evaluating LLMs for diverse software development tasks

This paper presents Patched Round-Trip Correctness (Patched RTC), a novel evaluation technique for Large Language Models (LLMs) in software development tasks like bug fixing and code review. It extends the original Round-Trip Correctness method to work with any LLM and task, offering a self-evaluating framework without human intervention. The study implements Patched RTC in an open-source framework called patchwork, allowing transparent evaluation across various workflows. This approach could significantly improve the assessment and selection of LLMs for software development tasks, potentially enhancing code quality, reducing bugs, and streamlining the development process in various industries.

Authors:  Asankhaya Sharma

Link:  https://arxiv.org/abs/2407.16557v1

Date: 2024-07-23


Psychomatics -- A Multidisciplinary Framework for Understanding Artificial Minds

This paper introduces Psychomatics, a multidisciplinary framework bridging cognitive science, linguistics, and computer science to understand how Large Language Models (LLMs) process information compared to human cognition. The researchers focus on language development and use, drawing parallels between LLMs and biological systems. They highlight LLMs' ability to map and manipulate complex linguistic patterns but note their limitations in experiential, emotional, and embodied aspects of cognition. This framework could inform the development of more human-like AI systems, potentially improving natural language interfaces, enhancing AI-human interaction, and deepening our understanding of both artificial and biological intelligence.

Authors:  Giuseppe Riva, Fabrizia Mantovani, Brenda K. Wiederhold, Antonella Marchetti, Andrea Gaggioli

Link:  https://arxiv.org/abs/2407.16444v1

Date: 2024-07-23


A deeper look at depth pruning of LLMs

This study explores different block importance metrics for pruning Large Language Models (LLMs), including adaptive metrics like Shapley value. The researchers extend their analysis to individual self-attention and feed-forward layers, finding that self-attention layers are more amenable to pruning. They also investigate performance recovery techniques using lightweight adapters. This work could lead to more efficient LLM deployment, reducing computational resources and energy consumption while maintaining performance. Potential applications include optimizing language models for mobile devices, improving real-time language processing in resource-constrained environments, and making large-scale language models more accessible for various applications.

Authors:  Shoaib Ahmed Siddiqui, Xin Dong, Greg Heinrich, Thomas Breuel, Jan Kautz, David Krueger, Pavlo Molchanov

Link:  https://arxiv.org/abs/2407.16286v1

Date: 2024-07-23


DDK: Distilling Domain Knowledge for Efficient Large Language Models

This paper introduces DDK, a new Large Language Model (LLM) distillation framework that dynamically adjusts the composition of the distillation dataset based on domain performance differences between teacher and student models. This approach addresses the issue of uneven knowledge transfer across domains in existing distillation methods. The researchers demonstrate that DDK significantly improves student model performance, outperforming both continuously pretrained baselines and existing knowledge distillation methods. This technique could lead to more efficient and effective LLMs for domain-specific applications, such as specialized chatbots, industry-specific language models, or tailored language understanding systems for various sectors.

Authors:  Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

Link:  https://arxiv.org/abs/2407.16154v1

Date: 2024-07-23


Towards Robust Knowledge Tracing Models via k-Sparse Attention

This study proposes sparseKT, a framework to improve the robustness and generalization of attention-based Deep Learning Knowledge Tracing (DLKT) models. The researchers incorporate a k-selection module to pick items with the highest attention scores, using two sparsification heuristics. This approach helps DLKT models focus on relevant student interactions and achieves comparable predictive performance to state-of-the-art models. The sparseKT framework could enhance personalized learning systems, adaptive educational technologies, and intelligent tutoring systems by providing more accurate and robust predictions of student knowledge and performance across various educational contexts.

Authors:  Shuyan Huang, Zitao Liu, Xiangyu Zhao, Weiqi Luo, Jian Weng

Link:  https://arxiv.org/abs/2407.17097v1

Date: 2024-07-24


From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM

This paper introduces AURORA, an automatic end-to-end cyberattack construction and emulation framework leveraging Large Language Models (LLMs). AURORA can autonomously build multi-stage cyberattack plans based on Cyber Threat Intelligence reports, construct emulation infrastructures, and execute attack procedures. The framework incorporates a wider range of attack techniques than professional red teams and can construct attacks and infrastructures in minutes without human intervention. This technology could significantly enhance cybersecurity testing and evaluation, helping organizations identify vulnerabilities, improve defense strategies, and train security personnel more effectively against advanced and evolving cyber threats.

Authors:  Lingzhi Wang, Jiahui Wang, Kyle Jung, Kedar Thiagarajan, Emily Wei, Xiangmin Shen, Yan Chen, Zhenyuan Li

Link:  https://arxiv.org/abs/2407.16928v1

Date: 2024-07-24


Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization

This paper presents a novel sequence decoding method for self-improved learning in Neural Combinatorial Optimization (NCO). The approach uses sampling without replacement and incrementally follows the best solution found, repeating the process from intermediate partial solutions. By modifying the policy to ignore previously sampled sequences, it increases solution diversity. The method shows strong performance on various optimization problems, including the Traveling Salesman, Capacitated Vehicle Routing, and Job Shop Scheduling Problems. This technique could enhance optimization algorithms for logistics, supply chain management, manufacturing scheduling, and other industries relying on complex combinatorial problem-solving.

Authors:  Jonathan Pirnay, Dominik G. Grimm

Link:  https://arxiv.org/abs/2407.17206v1

Date: 2024-07-24


Advancing Brain Imaging Analysis Step-by-step via Progressive Self-paced Learning

This study introduces the Progressive Self-Paced Distillation (PSPD) framework for brain imaging analysis, addressing challenges such as heterogeneity, individual variations, and small dataset sizes. PSPD employs an adaptive and progressive pacing and distillation mechanism, allowing for dynamic curriculum adjustments based on past and present model states. The framework demonstrates superior performance and generalization capabilities across various convolutional neural networks using the Alzheimer's Disease Neuroimaging Initiative dataset. This approach could significantly improve medical image analysis, potentially enhancing early diagnosis, treatment planning, and research in neurodegenerative diseases and other brain disorders.

Authors:  Yanwu Yang, Hairui Chen, Jiesi Hu, Xutao Guo, Ting Ma

Link:  https://arxiv.org/abs/2407.16128v1

Date: 2024-07-23


GenRec: A Flexible Data Generator for Recommendations

This paper presents GenRec, a novel framework for generating synthetic user-item interactions that exhibit realistic properties observed in recommendation scenarios. Based on a stochastic generative process using latent factor modeling, GenRec offers high flexibility and a wide range of hyper-parameters for customizing interaction generation. The framework addresses the scarcity of realistic datasets in benchmarking recommender systems and social network analysis methods. GenRec could be valuable for researchers and developers in e-commerce, social media, content streaming platforms, and other recommendation-based systems, enabling more robust testing and development of recommendation algorithms without relying on sensitive user data.

Authors:  Erica Coppolillo, Simone Mungari, Ettore Ritacco, Giuseppe Manco

Link:  https://arxiv.org/abs/2407.16594v1

Date: 2024-07-23


On the Use of Immersive Digital Technologies for Designing and Operating UAVs

This paper provides a comprehensive overview of current research and developments involving immersive digital technologies, such as Digital Twin (DT) and Extended Reality (XR), for Unmanned Aerial Vehicles (UAVs). The authors explore the integration of these technologies with Artificial Intelligence algorithms to create more intelligent, adaptive, and responsive UAV systems. They also discuss research gaps and suggest future directions. This work could inform the development of advanced UAV control systems, enhancing their use in applications such as communication relay networks, disaster response, precision agriculture, and urban planning, by improving situational awareness and decision-making capabilities.

Authors:  Yousef Emami, Kai Li, Luis Almeida, Wei Ni

Link:  https://arxiv.org/abs/2407.16288v1

Date: 2024-07-23


AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

This paper introduces AppWorld Engine, a high-quality execution environment of 9 day-to-day apps operable via 457 APIs, and AppWorld Benchmark, a suite of 750 natural, diverse, and challenging autonomous agent tasks. These tools address the gap in existing benchmarks for tool use, which typically cover only simple API call sequences. AppWorld supports robust programmatic evaluation with state-based unit tests, allowing for different task completion methods while checking for unexpected changes. This benchmark could significantly advance the development and evaluation of interactive coding agents, potentially improving AI assistants, automation tools, and software testing methodologies across various industries.

Authors:  Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

Link:  https://arxiv.org/abs/2407.18901v1

Date: 2024-07-26


Adversarial Robust Decision Transformer: Enhancing Robustness of RvS via Minimax Returns-to-go

This study proposes the Adversarial Robust Decision Transformer (ARDT), a worst-case-aware Reinforcement Learning via Supervised Learning (RvS) algorithm. ARDT learns and conditions the policy on in-sample minimax returns-to-go, aligning the target return with the worst-case return learned through minimax expectile regression. This approach enhances robustness against powerful test-time adversaries in sequential games and continuous adversarial RL environments. ARDT could improve the performance and reliability of AI systems in adversarial or uncertain environments, with potential applications in game theory, cybersecurity, financial modeling, and autonomous systems operating in complex, dynamic scenarios.

Authors:  Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic

Link:  https://arxiv.org/abs/2407.18414v1

Date: 2024-07-25


Co-designing an AI Impact Assessment Report Template with AI Practitioners and AI Compliance Experts

This paper presents a co-designed template for AI impact assessment reports, developed through an iterative process with AI practitioners and compliance experts. The template is grounded in the EU AI Act, NIST's AI Risk Management Framework, and ISO 42001 AI Management System. It effectively provides necessary information for impact assessments and documents the broad impacts of AI systems. This tool could be valuable for companies developing AI systems, helping them ensure compliance with regulations, guide the design stage of AI uses, and assess potential impacts. It may also aid policymakers and regulators in standardizing AI impact assessments across industries.

Authors:  Edyta Bogucka, Marios Constantinides, Sanja Šćepanović, Daniele Quercia

Link:  https://arxiv.org/abs/2407.17374v1

Date: 2024-07-24


Dataset Distribution Impacts Model Fairness: Single vs. Multi-Task Learning

This study evaluates the performance of skin lesion classification using ResNet-based CNNs, focusing on patient sex variations in training data and three different learning strategies. The researchers present a linear programming method for generating datasets with varying patient sex and class labels. They find that sex-specific training data yields better results, single-task models exhibit sex bias, and datasets including male patients enhance model performance for the male subgroup. This research could inform the development of fairer and more accurate medical image analysis models, potentially improving diagnostic accuracy and reducing biases in healthcare applications.

Authors:  Ralf Raumanns, Gerard Schouten, Josien P. W. Pluim, Veronika Cheplygina

Link:  https://arxiv.org/abs/2407.17543v1

Date: 2024-07-24


Lawma: The Power of Specialization for Legal Tasks

This paper conducts a comprehensive study of 260 legal text classification tasks, comparing the performance of GPT-4 and a fine-tuned Llama 3 model. The researchers demonstrate that a lightly fine-tuned Llama 3 model vastly outperforms GPT-4 on almost all tasks, typically by double-digit percentage points. They also show that a single model can be fine-tuned on all 260 tasks simultaneously with only a small loss in accuracy. This work could significantly impact empirical legal research, offering a more efficient and accurate alternative to traditional human annotation or prompting commercial models for legal text classification tasks.

Authors:  Ricardo Dominguez-Olmedo, Vedant Nanda, Rediet Abebe, Stefan Bechtold, Christoph Engel, Jens Frankenreiter, Krishna Gummadi, Moritz Hardt, Michael Livermore

Link:  https://arxiv.org/abs/2407.16615v1

Date: 2024-07-23


Privacy Threats and Countermeasures in Federated Learning for Internet of Things: A Systematic Review

This systematic review analyzes recent literature to identify privacy threats in Federated Learning (FL) within Internet of Things (IoT) environments and evaluates defensive measures to mitigate these threats. The researchers identify various privacy threats, including inference attacks, poisoning attacks, and eavesdropping, along with defensive measures such as Differential Privacy and Secure Multi-Party Computation. This work could inform the development of more secure and privacy-preserving FL systems for IoT applications, potentially improving data protection in smart homes, industrial IoT, healthcare IoT, and other connected environments.

Authors:  Adel ElZemity, Budi Arief

Link:  https://arxiv.org/abs/2407.18096v1

Date: 2024-07-25


Sublinear Regret for An Actor-Critic Algorithm in Continuous-Time Linear-Quadratic Reinforcement Learning

This study presents a model-free approach to reinforcement learning for continuous-time linear-quadratic control problems where the volatility of state processes depends on both state and control variables. The researchers devise an actor-critic algorithm to learn the optimal policy parameter directly and introduce a novel exploration schedule. They prove that the algorithm achieves a sublinear regret bound and provide convergence rates. This work could enhance control systems in various applications, such as robotics, autonomous vehicles, and financial modeling, where continuous-time dynamics and state-dependent volatility are important considerations.

Authors:  Yilie Huang, Yanwei Jia, Xun Yu Zhou

Link:  https://arxiv.org/abs/2407.17226v1

Date: 2024-07-24


