IFDS Taint Analysis With Access Paths

Over the years, static taint analysis emerged as the analysis of choice to detect some of the most common web application vulnerabilities, such as SQL injection (SQLi) and cross-site scripting (XSS). Furthermore, from an implementation perspective, the IFDS dataflow framework stood out as one of the most successful vehicles to implement static taint analysis for real-world Java applications. While existing approaches scale reasonably to medium-size applications (e.g. up to one hour analysis time for less than 100K lines of code), our experience suggests that no existing solution can scale to very large industrial code bases (e.g. more than 1M lines of code). In this paper, we present our novel IFDS-based solution to perform fast and precise static taint analysis of very large industrial Java web applications. Similar to state-of-the-art approaches to taint analysis, our IFDS-based taint analysis uses access paths to abstract objects and fields in a program. However, contrary to existing approaches, our analysis is demand-driven, which restricts the amount of code to be analyzed, and does not rely on a computationally expensive alias analysis, thereby significantly improving scalability.



Generality—or Not—in a Domain-Specific Language (A Case Study)

Slides for an invited keynote at the 2021 conference ( One-sentence abstract: This talk is an overview and critique of the (overall very successful) programming language used in BibTeX style (.bst) files, for the purpose of illustrating some general principles to keep in mind when designing domain-specific languages. Abstract for the keynote: In 2017 I took a look at a widely used programming language that had no name, so I called it “Computer Science Metanotation” (CSM), and I observed that it had grown in interesting and sometimes inconsistent ways. For this talk I will examine another, perhaps even more widely used language that also has no name. Unlike CSM, it has remained almost unchanged for the last 33 years—yet programmers continue to write new applications in it today and even attempt to broaden its programming style, despite the fact that it is an extremely domain-specific language (DSL) that clearly was not designed for growth. It is a functional language—or is it? We will explain the language briefly, then use it as a case study—based on my own experience in wrestling with it—to explore more general questions about language design. How can we find the right balance in a DSL between specificity (which can make it much easier to tackle the intended application domain) and generality (which can support language growth, new application domains, or just a broader view of the original domain)? To what extent should even a domain-specific language be self-aware?



Towards Intelligent Application Security

Over the past 20 years we have seen application security evolve from analysing application code through Static Application Security Testing (SAST) tools, to detecting vulnerabilities in running applications via Dynamic Application Security Testing (DAST) tools. The past 10 years have seen new flavours of tools to provide combinations of static and dynamic tools via Interactive Application Security Testing (IAST), examination of the components and libraries of the software called Software Composition Analysis (SCA), protection of web applications and APIs using signature-based Web Application Firewalls (WAF), and monitoring the application and blocking attacks through Runtime Application Self Protection (RASP) techniques. The past 10 years has also seen an increase in the uptake of the DevOps model that combines software development and operations to provide continuous delivery of high quality software. As security has become more important, the DevOps model has evolved to the DevSecOps model where software development, operations and security are all integrated. There has also been increasing usage of learning techniques, including machine learning, and program synthesis. Several tools have been developed that make use of machine learning to help developers make quality decisions about their code, tests, or runtime overhead their code produces. However, such techniques have not been applied to application security as yet. In this talk I discuss how to provide an automated approach to integrate security into all aspects of application development and operations, aided by learning techniques. This incorporates signals from the code operations and beyond, and automation, to provide actionable intelligence to developers, security analysts, operations staff, and autonomous systems. I will also consider how malware and threat intelligence can be incorporated into this model to support Intelligent Application Security in a rapidly evolving world.



Are many heaps better than one?

The recent introduction by Intel of widely available Non-Volatile RAM has reawakened interest in persistence, a hot topic of the 1980s and 90s. The most ambitious schemes of that era were not adopted; I will speculate as to why, and introduce a new approach based on multiple heaps, designed to overcome the problems. I’ll present the main features of the new persistence model, and describe a prototype implementation I’ve been working on for GraalVM Native Image. This purpose of this work-in-progress is to allow experimentation with the new model, so that the community can assess its desirability. I’ll outline the main features of the prototype and some of the remaining challenges.



Fast and Efficient Java Microservices With GraalVM @ Oracle Developer Live

Slides for Oracle Developer Live - Java Innovations conference. This talk will be focused on the benefits Native Image and recent updates



How to program machine learning in Java with the Tribuo library

Tribuo is a new open source library written in Java from Oracle Labs’ Machine Learning Research Group. The team’s goal for Tribuo is to build an ML library for the Java platform that is more in line with the needs of large software systems. Tribuo operates on objects, not primitive arrays, Tribuo’s models are self-describing and reproducible, and it provides a uniform interface over many kinds of prediction tasks.



ColdPress: An Extensible Malware Analysis Platform for Threat Intelligence

Malware analysis is still largely a manual task. This slow and inefficient approach does not scale to the exponential rise in the rate of new unique malware generated. Hence, automating the process as much as possible becomes desirable. In this paper, we present ColdPress – an extensible malware analysis platform that automates the end-to-end process of malware threat intelligence gathering integrated output modules to perform report generation of arbitrary file formats. ColdPress combines state-of-the-art tools and concepts into a modular system that aids the analyst to efficiently and effectively extract information from malware samples. It is designed as a user-friendly and extensible platform that can be easily extended with user-defined modules. We evaluated ColdPress with complex real-world malware samples (e.g., WannaCry), demonstrating its efficiency, performance and usefulness to security analysts. Our demo video is available at



Online Post-Processing in Rankings for Fair Utility Maximization

We consider the problem of utility maximization in online ranking applications while also satisfying a pre-defined fairness constraint. We consider batches of items which arrive over time, already ranked using an existing ranking model. We propose online post-processing for re-ranking these batches to enforce adherence to the pre-defined fairness constraint, while maximizing a specific notion of utility.  To achieve this goal, we propose two deterministic re-ranking policies. In addition, we learn a re-ranking policy based on a novel variation of learning to search. Extensive experiments on real world and synthetic datasets demonstrate the effectiveness of our proposed policies both in terms of adherence to the fairness constraint and utility maximization. Furthermore, our analysis shows that the performance of the proposed policies depends on the original data distribution w.r.t the fairness constraint and the notion of utility.



Formal Verification of Authenticated, Append-Only Skip Lists in Agda: Extended Version

Authenticated Append-Only Skiplists (AAOSLs) enable maintenance and querying of an authenticated log (such as a blockchain) without requiring any single party to store or verify the entire log, or to trust another party regarding its contents. AAOSLs can help to enable efficient dynamic participation (e.g., in consensus) and reduce storage overhead. In this paper, we formalize an AAOSL originally described by Maniatis and Baker, and prove its key correctness properties. Our model and proofs are machine checked in Agda. Our proofs apply to a generalization of the original construction and provide confidence that instances of this generalization can be used in practice. Our formalization effort has also yielded some simplifications and optimizations.



CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph-processing system lies a concurrent graph data structure storing the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph-processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytic workloads and low-memory graph mutations. Existing graph structures offer a hard trade-off between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these trade-offs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, and LLAMA, a state-of-the-art update-friendly graph structure. In our evaluation, which is based on popular graph-processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2x) with frequent updates.



A Latina in Tech

Having started my Computer Science degree while growing up in Colombia and later completing it in Australia, I went from being an overrepresented Latina to being an underrepresented one. Further, the female to male ratio in CS in both countries was also rather different.

Being a mum, a wife, a teacher, a researcher, a manager and a leader, in this talk, I provide some of my lessons learnt throughout my career, with examples of successes and failures throughout my PhD, academic life, and industrial research life.



The University of Queensland and Oracle team up to develop world-class cyber security experts

The field of cyber security is coming of age, with more than a million job openings globally, including many in Australia, and a strong move from reactive to preventative security taking form. At The University of Queensland, teaming up with industry specialists like Oracle Labs – the research and development branch of global technology firm Oracle – will ensure both industry and researchers can focus on the real issues that businesses and users care about.


Hardware and Software, Engineered to Work Together