#W19 #research #2023

Fill-in-the-blanks Self-Supervision in NLP

arxiv.org #2023 #MAY #W19 #arxiv.org #research

·towardsdatascience.com·May 12, 2023

Fill-in-the-blanks Self-Supervision in NLP

node2vec

arxiv.org #arxiv.org #2023 #MAY #W19 #research

·snap.stanford.edu·May 12, 2023

node2vec

On the Security Risks of Knowledge Graph Reasoning

Knowledge graph reasoning (KGR) -- answering complex logical queries over large knowledge graphs -- represents an important artificial intelligence task, entailing a range of applications (e.g., cyber threat hunting). However, despite its surging popularity, the potential security risks of KGR are largely unexplored, which is concerning, given the increasing use of such capability in security-critical domains. This work represents a solid initial step towards bridging the striking gap. We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. Further, we present ROAR, a new class of attacks that instantiate a variety of such threats. Through empirical evaluation in representative use cases (e.g., medical decision support, cyber threat hunting, and commonsense reasoning), we demonstrate that ROAR is highly effective to mislead KGR to suggest pre-defined answers for target queries, yet with negligible impact on non-target ones. Finally, we explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries, which leads to several promising research directions.

arxiv.org #arxiv.org #2023 #MAY #W19 #research

·arxiv.org·May 8, 2023

On the Security Risks of Knowledge Graph Reasoning

AutoML-GPT: Automatic Machine Learning with GPT

AI tasks encompass a wide range of domains and fields. While numerous AI models have been designed for specific tasks and applications, they often require considerable human efforts in finding the right model architecture, optimization algorithm, and hyperparameters. Recent advances in large language models (LLMs) like ChatGPT show remarkable capabilities in various aspects of reasoning, comprehension, and interaction. Consequently, we propose developing task-oriented prompts and automatically utilizing LLMs to automate the training pipeline. To implement this concept, we present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyperparameters. AutoML-GPT dynamically takes user requests from the model and data cards and composes the corresponding prompt paragraph. Ultimately, with this prompt paragraph, AutoML-GPT will automatically conduct the experiments from data processing to model architecture, hyperparameter tuning, and predicted training log. By leveraging {\ours}'s robust language capabilities and the available AI models, AutoML-GPT can tackle numerous intricate AI tasks across various tasks and datasets. This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many AI tasks.

arxiv.org #arxiv.org #2023 #MAY #W19 #research

·arxiv.org·May 8, 2023

AutoML-GPT: Automatic Machine Learning with GPT

What would a compute monitoring plan look like? [Linkpost] - LessWrong

Yonadav Shavit (CS PhD student at Harvard) recently released a paper titled What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring. …

arxiv.org #2023 #MAY #W19 #arxiv.org #research

·lesswrong.com·May 8, 2023

What would a compute monitoring plan look like? [Linkpost] - LessWrong

GitHub - IBM/MAX-Audio-Embedding-Generator: Generate embedding vectors from audio files

Generate embedding vectors from audio files. Contribute to IBM/MAX-Audio-Embedding-Generator development by creating an account on GitHub.

arxiv.org #arxiv.org #2023 #MAY #W19 #github.com #research

·github.com·May 13, 2023

GitHub - IBM/MAX-Audio-Embedding-Generator: Generate embedding vectors from audio files