News

08-MAY-2021

PRIVATE CROSS-SILO FEDERATED LEARNING FOR EXTRACTING VACCINE ADVERSE EVENT MENTIONS

Automatically extracting mentions of suspected drug or vaccine adverse events (potential side effects) from unstructured text is critical in the current pandemic, but small amounts of labeled training data remains silo-ed across organizations due to privacy concerns. Federated Learning (FL) is quickly becoming a goto distributed training paradigm for such users to jointly train a more accurate global model without physically sharing their data. However, literature on successful application of FL in real-world problem settings is somewhat sparse. In this pa- per, we describe our experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the con- text of mass scale vaccination programs. Furthermore, we show that Differential Privacy (DP), which offers stronger privacy guarantees, but severely cripples the global model’s prediction accuracy, thus dis-incentivizing users from participating in the federation. We demonstrate how recent innovation on personalization methods can help significantly recover the lost accuracy.

Read


30-APR-2021

Automated GPU Out-of-Bound Access Detectionand Prevention in a Managed Environment

GPUs have proven extremely effective at accelerating general-purpose workloads in fields from numerical simulation to deep learning and finance. However, even code written by experienced GPU programmers often offers little robustness, limiting the GPUs’ adoption in critical applications’ acceleration. Out-of-bounds array accesses are one of the most common sources of errors and vulnerabilities on GPUs and can be hard to detect and prevent due to the architectural characteristics of GPUs.This work presents an automated technique ensuring detection and protection against out-of-bounds array accesses inside CUDA GPU kernels. We compile kernels ahead-of-time, invoke them at run time using the Graal polyglot Virtual Machine and execute them on the GPU. Our technique is transparent to the user and operates on the LLVM Intermediate Representation. It adds boundary checks for array accesses based on array size knowledge, available at run time thanks to the managed execution environment, and optimizes the resulting code to minimize the impact of our modifications.We test our technique on 16 different GPU kernels extracted from common GPU workloads and show that we can prevent out-of-bounds array accesses in arbitrary GPU kernels without any statistically significant execution time overhead.

Read

26-APR-2021

Optimizing Inference Performance of Transformers on CPUs

Slides to be presented at the EuroMLSys'21 workshop

View

25-APR-2021

Vate: Runtime Adaptable Probabilistic Programming in Java

Inspired by earlier work on Augur, Vate is a probabilistic programming language for the construction of JVM based models with an Object Oriented interface. As a compiled language it is able to examine the dependency graph of the model to produce optimised code that can be dynamically targeted to different platforms.

Read

25-APR-2021

Optimizing Inference Performance of Transformers on CPUs

The Transformer architecture revolutionized the field of natural language processing (NLP). Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc. While enormous research attention is paid to the training of those models, relatively little efforts are made to improve their inference performance. This paper comes to address this gap by presenting an empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs. Focusing on the highly popular BERT model, we identify key components of the Transformer architecture where the bulk of the computation happens, and propose an Adaptive Linear Module Optimization (ALMO) to speed them up. The optimization is evaluated using the inference benchmark from HuggingFace, and is shown to achieve the speedup of up to x1.71. Notably, ALMO does not require any changes to the implementation of the models nor affects their accuracy.

Read

21-APR-2021

CLAMH Introduction

The Cross-Language Microbenchmark Harness (CLAMH) provides a unique environment for running software benchmarks. It is unique in that it allows comparison across different platforms and across different languages. For example, it allows the comparison of clang, gcc, llvm, and GraalVM Sulong on the same benchmark, and can also be used to compare the Java counterparts of the same benchmark running on any JVM. CLAMH allows users to verify vendor benchmark performance claims, baseline benchmark performance in their own compute environment, compare with other compute environments, and, by so doing, identify areas where performance can be improved. CLAMH has been released Open Source in the GraalVM repository - https://github.com/graalvm/CLAMH

Read

15-APR-2021

MSET2 Streaming Prognostics for IoT Telemetry on Oracle Roving Edge Infrastructure

Critical applications needed in real-world environments would be difficult or impossible to execute on the public cloud alone because of the massive bandwidth and latency needed to transmit and process vast amounts of data, as well as offer instant responses to the results of that analysis. Oracle's MSET2 prognostic ML algorithm, implemented on Roving Edge Clusters with NVIDIA Tesla T4 GPUs, attains unprecedented reductions in computational latencies and breakthrough throughput acceleration factors for large-scale ML streaming prognostics from dense-sensor fleets of assets in such fields as U.S. Department of Defense assets, utilities, oil & gas, commercial aviation, and prognostic cybersecurity for data center IT assets as well as DoD supervisory control and data acquisition assets and networks, and smart manufacturing.

View

30-MAR-2021

IFDS Taint Analysis With Access Paths

Over the years, static taint analysis emerged as the analysis of choice to detect some of the most common web application vulnerabilities, such as SQL injection (SQLi) and cross-site scripting (XSS). Furthermore, from an implementation perspective, the IFDS dataflow framework stood out as one of the most successful vehicles to implement static taint analysis for real-world Java applications. While existing approaches scale reasonably to medium-size applications (e.g. up to one hour analysis time for less than 100K lines of code), our experience suggests that no existing solution can scale to very large industrial code bases (e.g. more than 1M lines of code). In this paper, we present our novel IFDS-based solution to perform fast and precise static taint analysis of very large industrial Java web applications. Similar to state-of-the-art approaches to taint analysis, our IFDS-based taint analysis uses access paths to abstract objects and fields in a program. However, contrary to existing approaches, our analysis is demand-driven, which restricts the amount of code to be analyzed, and does not rely on a computationally expensive alias analysis, thereby significantly improving scalability.

Read

26-MAR-2021

Generality—or Not—in a Domain-Specific Language (A Case Study)

Slides for an invited keynote at the 2021 conference (https://2021.programming-conference.org). One-sentence abstract: This talk is an overview and critique of the (overall very successful) programming language used in BibTeX style (.bst) files, for the purpose of illustrating some general principles to keep in mind when designing domain-specific languages. Abstract for the keynote: In 2017 I took a look at a widely used programming language that had no name, so I called it “Computer Science Metanotation” (CSM), and I observed that it had grown in interesting and sometimes inconsistent ways. For this talk I will examine another, perhaps even more widely used language that also has no name. Unlike CSM, it has remained almost unchanged for the last 33 years—yet programmers continue to write new applications in it today and even attempt to broaden its programming style, despite the fact that it is an extremely domain-specific language (DSL) that clearly was not designed for growth. It is a functional language—or is it? We will explain the language briefly, then use it as a case study—based on my own experience in wrestling with it—to explore more general questions about language design. How can we find the right balance in a DSL between specificity (which can make it much easier to tackle the intended application domain) and generality (which can support language growth, new application domains, or just a broader view of the original domain)? To what extent should even a domain-specific language be self-aware?

View

23-MAR-2021

Are many heaps better than one?

The recent introduction by Intel of widely available Non-Volatile RAM has reawakened interest in persistence, a hot topic of the 1980s and 90s. The most ambitious schemes of that era were not adopted; I will speculate as to why, and introduce a new approach based on multiple heaps, designed to overcome the problems. I’ll present the main features of the new persistence model, and describe a prototype implementation I’ve been working on for GraalVM Native Image. This purpose of this work-in-progress is to allow experimentation with the new model, so that the community can assess its desirability. I’ll outline the main features of the prototype and some of the remaining challenges.

View

22-MAR-2021

Fast and Efficient Java Microservices With GraalVM @ Oracle Developer Live

Slides for Oracle Developer Live - Java Innovations conference. This talk will be focused on the benefits Native Image and recent updates

View

15-MAR-2021

How to program machine learning in Java with the Tribuo library

Tribuo is a new open source library written in Java from Oracle Labs’ Machine Learning Research Group. The team’s goal for Tribuo is to build an ML library for the Java platform that is more in line with the needs of large software systems. Tribuo operates on objects, not primitive arrays, Tribuo’s models are self-describing and reproducible, and it provides a uniform interface over many kinds of prediction tasks.

Read

09-MAR-2021

ColdPress: An Extensible Malware Analysis Platform for Threat Intelligence

Malware analysis is still largely a manual task. This slow and inefficient approach does not scale to the exponential rise in the rate of new unique malware generated. Hence, automating the process as much as possible becomes desirable. In this paper, we present ColdPress – an extensible malware analysis platform that automates the end-to-end process of malware threat intelligence gathering integrated output modules to perform report generation of arbitrary file formats. ColdPress combines state-of-the-art tools and concepts into a modular system that aids the analyst to efficiently and effectively extract information from malware samples. It is designed as a user-friendly and extensible platform that can be easily extended with user-defined modules. We evaluated ColdPress with complex real-world malware samples (e.g., WannaCry), demonstrating its efficiency, performance and usefulness to security analysts. Our demo video is available at https://youtu.be/AwlBo1rxR1U.

Read

01-MAR-2021

Online Post-Processing in Rankings for Fair Utility Maximization

We consider the problem of utility maximization in online ranking applications while also satisfying a pre-defined fairness constraint. We consider batches of items which arrive over time, already ranked using an existing ranking model. We propose online post-processing for re-ranking these batches to enforce adherence to the pre-defined fairness constraint, while maximizing a specific notion of utility.  To achieve this goal, we propose two deterministic re-ranking policies. In addition, we learn a re-ranking policy based on a novel variation of learning to search. Extensive experiments on real world and synthetic datasets demonstrate the effectiveness of our proposed policies both in terms of adherence to the fairness constraint and utility maximization. Furthermore, our analysis shows that the performance of the proposed policies depends on the original data distribution w.r.t the fairness constraint and the notion of utility.

Read


Hardware and Software, Engineered to Work Together