Our Publications
Every year our researchers publish hundreds of papers to share their findings with the industry and the academic community. Our primary research areas are big data and machine learning , cloud computing and programming languages..
Research Papers
Accurate Compilation Replay via Remote JIT Compilation
When a JIT compiler crashes in a production deployment, compiler developers wish to reproduce the problem locally. However, existing approaches to replay compilation lack the necessary accuracy for this use case, or they introduce too much of a maintenance burden. We propose to achieve ac- curate compilation replay by running a remote compilation, recording the input to the remote compiler, and replaying the compilation using the recorded data. The benefit is greatly reduced iteration times for compiler developers when such an issue occurs.
Static Analysis for Java
Slides for the talk at the 2024 JVM Language Summit.
Finding Cuts in Static Analysis Graphs to Debloat Software
As software projects grow increasingly more complex, debloating gains traction. While static analyses yield a coarse over-approximation of reachable code, approaches based on dynamic execution traces risk program correctness. By allowing the developer to reconsider only a few methods and still achieve a significant reduction in code size, cut-based debloating could minimize the risk. In this paper, we therefore propose the idea of finding small cuts in rule graphs of static analyses. After introducing an analysis with suitable semantics, we discuss how to encode its rules into a directed hypergraph. We then present an algorithm for efficiently finding the most effective single cut in the graph. The execution time of the proposed operations allows for the deployment in interactive tools. Finally, we show that our graph model is able to expose heavy methods worthwhile to reconsider.
Efficient control-flow graph traversal
In the process of program translation, compilers traverse a large number of control flow graphs, and therefore the speed of individual traversals can significantly impact overall compilation time. While standard algorithms for graph traversal such as Breadth-First Search (BFS) and Depth-First Search (DFS) operate in linear time and space complexity concerning the number of nodes and edges in the graph, their execution time and memory usage vary depending on the shape of the graph and the data structure used in the algorithm implementation. We analyze the time and space efficiency of executing control flow graph traversals obtained by compiling Java and Scala programs using the Graal compiler. The results analysis shows that the breadth-first traversal of control flow graphs is up to 1.6 times faster than depth-first traversal and incurs lower memory overhead across all benchmark programs. We also demonstrate that the choice of data structure used in the algorithm implementation affects its speed, with a doubly linked list proving to be the most efficient across all benchmark programs.
On the Impact of Lower Recall and Precision in Defect Prediction for Guiding Search-Based Software Testing
Defect predictors, static bug detectors and humans inspecting the code can propose locations in the program that are more likely to be buggy before they are discovered through testing. Automated test generators such as search-based software testing (SBST) techniques can use this information to direct their search for test cases to likely-buggy code, thus speeding up the process of detecting existing bugs in those locations. Often the predictions given by these tools or humans are imprecise, which can misguide the SBST technique and may deteriorate its performance. In this paper, we study the impact of imprecision in defect prediction on the bug detection effectiveness of SBST. Our study finds that the recall of the defect predictor, i.e., the proportion of correctly identified buggy code, has a significant impact on bug detection effectiveness of SBST with a large effect size. More precisely, the SBST technique detects 7.5 fewer bugs on average (out of 420 bugs) for every 5% decrements of the recall. On the other hand, the effect of precision, a measure for false alarms, is not of meaningful practical significance as indicated by a very small effect size. In the context of combining defect prediction and SBST, our recommendation is to increase the recall of defect predictors as a primary objective and precision as a secondary objective. In our experiments, we find that 75% precision is as good as 100% precision. To account for the imprecision of defect predictors, in particular low recall values, SBST techniques should be designed to search for test cases that also cover the predicted non-buggy parts of the program, while prioritising the parts that have been predicted as buggy.
Towards safeguarding software components from supply chain attacks
Software supply chain attacks exploit discrepancies between source code repositories and deployed artifacts, highlighting the need for rigorous integrity checks during the artifact’s build process. As systems grow in complexity, preemptive measures are essential to ensure that the source code certifiably aligns with the deployed code. Modern software development relies heavily on third-party libraries sourced from registries like Maven Central, npm, and PyPI. However, these ecosystems have become prime targets for supply-chain attacks, introducing malware into and also shadowing trusted packages. Such attacks jeopardize both developers and users, compromising the integrity of their software supply chain. This presentation discusses recent supply chain attacks and proposed solutions. Additionally, we present Macaron, our open-source project from Oracle Labs offering a flexible checker framework and policy engine to detect and mitigate supply chain security threats, safeguarding software components and maintaining their security posture over the development lifecycle.
Scaling Type-Based Points-to Analysis with Saturation
Designing a whole-program static analysis requires trade-offs between precision and scalability. While a context-insensitive points-to analysis is often considered a good compromise, it still has non-linear complexity that leads to scalability problems when analyzing large applications. On the other hand, rapid type analysis scales well but lacks precision. We use saturation in a context-insensitive type-based points-to analysis to make it as scalable as a rapid type analysis, while preserving most of the precision of the points-to analysis. With saturation, the points-to analysis only propagates small points-to sets for variables. If a variable can have more values than a certain threshold, the variable and all its usages are considered saturated and no longer analyzed. Our implementation in the points-to analysis of GraalVM Native Image, a closed-world approach to build standalone binaries for Java applications, shows that saturation allows GraalVM Native Image to analyze large Java applications with hundreds of thousands of methods in less than two minutes.
GraalSP: Polyglot, Efficient, and Robust Machine Learning-Based Static Profiler
Compilers use profiles to apply profile-guided optimizations and produce efficient programs. Dynamic profilers collect high-quality profiles but require identifying suitable profile collection workloads, introduce additional complexity to the application build pipeline, and cause significant time and memory overheads. Modern static profilers use machine learning (ML) models to predict profiles and mitigate these issues. However, state-of-the-art ML-based static profilers handcraft features, which are platform-specific and challenging to adapt to other architectures and programming languages. They use computationally expensive deep neural network models, thus increasing application compile time. Furthermore, they can introduce performance degradation in the compiled programs due to inaccurate profile predictions. We present GraalSP, a portable, polyglot, efficient, and robust ML-based static profiler. GraalSP is portable as it defines features on a high-level, graph-based intermediate representation and semi-automates the definition of features. For the same reason, it is also polyglot and can operate on any language that compiles to Java bytecode (such as Java, Scala, and Kotlin). GraalSP is efficient as it uses an XGBoost model based on lightweight decision tree models and robust as it uses branch probability prediction heuristics to ensure the high performance of compiled programs. We integrated GraalSP into the Graal compiler and achieved an execution time speedup of 7.46% geometric mean compared to a default configuration of the Graal compiler.
Synthesis of Allowlists for Runtime Protection against SQLi
Data is the new oil. This metaphor is commonly used to highlight the fact that data is a highly valuable commodity. Nowadays, much of worldwide data sits in SQL databases and transits through web-based applications of all kinds. As the value of data increases and attracts more attention from malicious actors, application protections against SQL injections need to become more sophisticated. Although SQL injections have been known for many years, they are still one of the top security vulnerabilities. For example, in 2022 more than 1000 CVEs related to SQL injection were reported. We propose a runtime application protection approach that infers and constrains the information that can be disclosed by database-backed applications. Where existing approaches use syntax or hand-crafted features as a proxy for information disclosure, we propose a lightweight information disclosure model that faithfully captures the semantics of SQL and achieves finer-grain security.
Distributed Asynchronous Regular Path Queries (RPQs) on Graphs
Graph pattern-matching queries enable flexible graph exploration and analysis, similar to what SQL provides for relational databases. One of the most expressive and powerful constructs in graph querying is regular path queries, also called RPQs. RPQs enable support for variable-length path patterns based on regular expressions, such as (p1:person)-/:knows+/->(p2:person), which searches for non-zero paths of any length between two persons. In this paper, we introduce a novel design for distributed RPQs that builds on top of distributed asynchronous pipelined traversals to enable (i) memory control of path explorations, with (ii) great performance and scalability. We evaluate our system and show that with sixteen machines, it outperforms Neo4j by 82 times on average and a relational implementation of the same queries in PostgreSQL by 71 times, while maintaining low memory consumption.
Towards an Abstraction for Verifiable Credentials and Zero Knowledge Proofs
Most standards efforts and projects around Verifiable Credentials either do not enable use of Zero Knowledge Proofs to balance privacy and accountability, or are too tightly tied to specific cryptographic libraries, which limits choice, flexibility, progress and sustainability. For example, if a project targets a cryptographic library that stops being maintained or otherwise becomes an undesirable dependency, these events can threaten the sustainability of the whole project. We are working on an abstraction to address this problem, which has additional benefits such as making it much simpler to express and understand use case requirements, especially for people without expertise in using specific cryptography libraries. These slides share some of our observations, ideas, experience and opinions so far.
Macaron: A Logic-based Framework for Software Supply Chain Security Assurance
Many software supply chain attacks exploit the fact that what is in a source code repository may not match the artifact that is actually deployed in one’s system. This paper describes a logic-based framework that analyzes a software component and its dependencies to determine if they are built in a trustworthy fashion. The properties that are checked include the availability of build provenances and whether the build and deployment process of an artifact is tamper resistant. These properties are based on the open-source community efforts, such as SLSA, that enable an incremental approach to improve supply chain security. We evaluate our tool on the top-30 Java, Python, and npm open-source projects and show that the majority still do not produce provenances. Our evaluation also shows that a large number of open-source Java and Python projects do not have a transparent build platform to produce artifacts, which is a necessary requirement to increase the trust in the published artifacts. We show that our tool fills a gap in the current software supply chain security landscape, and by making it publicly available the open-source community can both benefit from and contribute to it.
Remote Just-in-Time Compilation for Dynamic Languages
Cloud platforms allow applications to meet fluctuating levels of demand while maximizing hardware occupancy at the same time. These deployment models are characterized by short-lived appli- cations running in resource-constrained environments. This poses a challenge for dynamic languages with just-in-time (JIT) compi- lation. Dynamic-language runtimes suffer from a warmup phase and resource-usage peaks caused by JIT compilation. Offloading compilation jobs to a dedicate server is a possible mitigation for these problems. We propose leveraging remote JIT compilation as means to enable coordination between the independent instances. By sharing compilation results, aggregating profiles, and adapting the compiler and compilation policy, we strive to improve peak performance and further reduce warmup times. Additionally, an implementation on top of the Truffle framework enables us to bring these benefits to many popular languages.
Role of Program Analysis in Security Vulnerability Detection: Then and Now
Program analysis techniques play an important role in detecting security vulnerabilities. In this paper we describe our experiences in developing such tools that can be used in an industrial setting. The main driving forces for adoption are low false positive rate, ease of integration in the developer's workflow and results that are easy to understand. We also show how program analysis tools had to evolve with the evolving needs of the organisation. We conclude with our vision on how program analysis tools will be melded with DevSecOps.
Smoothing Entailment Graphs with Language Models
The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE). EGs are theoretically-founded and computationally efficient, but as symbolic models for natural language inference, they fail if a novel premise or hypothesis vertex is missing at test-time. We introduce a theory of optimal graph smoothing to overcome vertex sparsity by constructing transitive chains. We then demonstrate an efficient, open-domain smoothing method using an off-the-shelf Language Model to find approximations of missing premise predicates, improving recall by 25.1 and 16.3 percentage points on two difficult directional entailment datasets while raising average precision. Further, in a recent QA task, we show that EG smoothing is most useful for answering questions with lesser supporting text, where missing predicates are more costly. Finally, in controlled experiments with WordNet we show that hypothesis smoothing is difficult, but possible in principle.
Taming Multi-GPU Greedy Scheduling Through a Polyglot Runtime
Multi-GPU systems are increasingly being deployed in cloud data centers, but using GPUs efficiently from highlevel programming languages remains a challenge. Moreover, exploiting the full capabilities of multi-GPU systems is an arduous task due to the complex interconnection topology between available accelerators and the variety of inter-GPU communication patterns exhibited by different workloads. This work introduces a novel scheduler for multi-task GPU computations that provides transparent asynchronous execution on multi- GPU systems without requiring prior information about the program dependencies or the underlying system architecture. Our scheduler integrates with the polyglot GraalVM ecosystem and is therefore available for multiple high-level languages, providing a general framework that can significantly lower the barriers to entry to multi-GPU acceleration. We validate our work on a set of benchmarks designed to investigate scalability and inter-GPU communication. Experimental results show how our scheduler automatically achieves 80-90% peak performance against hand-optimized CUDA host code on Volta and Ampere multi-GPU systems.
Diagnosing Compiler Performance by Comparing Optimization Decisions
Modern compilers apply a set of optimization passes aiming to speed up the generated code. The combined effect of individual optimizations is hard to predict. Thus, changes to a compiler’s code may hinder the performance of generated code as an unintended consequence. Performance regressions are often related to misapplied opti- mizations. The regressions are hard to investigate, considering the vast number of compilation units and applied optimizations. Ad- ditionally, a method may be part of several compilation units and optimized differently in each. Moreover, compiled methods and in- lining decisions are not invariant across runs of the virtual machine (VM). We propose to solve the problem of diagnosing performance regressions by capturing the compiler’s optimization decisions. We do so by representing the applied optimization phases, optimization decisions, and inlining decisions in the form of trees. This paper introduces an approach utilizing tree edit distance (TED) to detect optimization differences in a semi-automated way. We present an approach to compare optimization decisions in differently-inlined methods. We employ these techniques to pinpoint the causes of performance problems in various benchmarks of the Graal compiler.
Generating Java Interfaces for Accessing Foreign Objects
Language interoperability (e.g., calling Python methods from Java programs) is a critical challenge in software develop- ment, often leading to code inconsistencies, human errors, and reduced readability. This paper presents a work-in-progress project aimed at addressing this issue by providing a tool that automates the generation of Java interfaces for accessing data and methods written in other languages. Using existing code analysis techniques the tool aims to produce easy to use abstractions for interop, intended to reduce human error and to improve code clarity. Although the tool is not yet finished, it has already shown promising results. Initial evaluations demonstrate its ability to identify language-specific features and automatically generate equiv- alent Java interfaces. This allows developers to efficiently integrate code written in foreign languages into Java projects while maintaining code readability and minimizing errors.
GraalVM Scripting Languages as Maven Dependencies
The GraalVM project serves as an umbrella for a diverse set of interesting technologies, all built around the GraalVM compiler. The most well-known at this time is the GraalVM native-image tool. However, there is also the GraalVM JIT compiler, a drop-in replacement for HotSpot's C2 compiler, and implementations of additional GraalVM languages such as the Python (GraalPy) and JavaScript (GraalJS). These additional GraalVM languages can be used as standalone distributions or through a Java embedding API, allowing for the extension of Java applications with these languages. For instance, you can offer Python scripting capabilities to users of your Java application. Even if such an application is compiled with native-image, it retains the ability to dynamically load, execute, and even JIT compile Python scripts. With its recent release, the GraalVM project was restructured, decoupling the additional languages from its core. We now have the GraalVM JDK distribution, which is an OpenJDK build enhanced with native-image, and the additional languages are delivered as Maven dependencies, compatible not only with GraalVM JDK but also with OpenJDK and OracleJDK, albeit with some caveats. In this session, we will explore how to use GraalVM languages as Maven dependencies and showcase their potential to enhance Java applications.
Comparing Rapid Type Analysis with Points-To Analysis in GraalVM Native Image
Whole-program analysis is an essential technique that enables advanced compiler optimizations. An important example of such a method is points-to analysis used by ahead-of-time (AOT) compilers to discover program elements (classes, methods, fields) used on at least one program path. GraalVM Native Image uses a points-to analysis to optimize Java applications, which is a time-consuming step of the build. We explore how much the analysis time can be improved by replacing the points-to analysis with a rapid type analysis (RTA), which computes reachable elements faster by allowing more imprecision. We propose several extensions of previous approaches to RTA: making it parallel, incremental, and supporting heap snapshotting. We present an extensive experimental evaluation of the effects of using RTA instead of points-to analysis, in which RTA allowed us to reduce the analysis time for Spring Petclinic (a popular demo application of the Spring framework) by 64% and the overall build time by 35% at the cost of increasing the image size due to the imprecision by 15%.
Automated Machine Learning with Explainability
ML has revolutionized a large range of industry applications with new techniques for consuming complex data modalities, such as images and text. However, for a given dataset and business use case, non-technical users are faced by adoption limiting questions, like which model should I use and how should I set its hyper-parameters. This is challenging and time consuming even for seasoned data scientists. The AutoMLx team at Oracle Labs has developed an automated machine learning pipeline with explainability tools built-in for novice and advanced users. In this talk, we provide an overview of our current and upcoming AutoMLx features and some applications; for example, how to predict construction site delays and how to forecast CPU resource usage based on previous consumption trends.
Why Is Static Application Security Testing Hard to Learn?
In this article, we summarize our experience in combining program analysis with machine learning (ML) to develop a technique that can improve the development of specific program analyses. Our experience is negative. We describe the areas that need to be addressed if ML techniques are to be useful in the program analysis context. Most of the issues that we report are different from the ones that discuss the state of the art in the use of ML techniques to detect security vulnerabilities.
Security Research: Program Analysis Meets Security
In this paper we present the key features of some of the security analysis tools developed at Oracle, Labs. These include Parfait, a static analyser, Affogato a dynamic analysis based on run-time instrumentation of Node.js applications and Gelato a dynamic analysis tool that inspects only the client-side code written in JavaScript. We show the how these tools can be integrated at different phases of the software development life-cycle. This paper is based on the presentation at the ICTAC school in 2021.
Vibration Resonance Spectrometry (VRS) for the Advanced Streaming Detection of Rotor Unbalance
Determination of the diagnosis thresholds is crucial for the fault diagnosis of industry assets. Rotor machines under different working conditions are especially challenging because of the dynamic torque and speed. In this paper, an advanced machine learning based signal processing innovation termed the multivariate state estimation technique is proposed to improve the accuracy of the diagnosis thresholds. A novel preprocessing technique called vibration resonance spectrometry is also applied to achieve a low computation cost capability for real time condition monitoring. The monitoring system that utilizes the above methods is then applied for prognostics of a fan model as an example. Different levels of radial unbalance were added on the fan and tested, and then compared with the health state. The results show that the proposed methodology can detect the unbalance with a good accuracy and low computation cost. The proposed methodology can be applied for complex engineering assets for better predictive monitoring that could be processed with on-premise edge devices, or eventually a cloud platform due to its capacity for lossless dimension reduction.
Better Distributed Graph Query Planning With Scouting Queries
Query planning is essential for graph query execution performance. In distributed graph processing, data partitioning and messaging significantly influence performance. However, these aspects are difficult to model analytically, which makes query planning especially challenging. This paper introduces scouting queries, a lightweight mechanism to gather runtime information about different query plans, which can then be used to choose the “best” plan. In a depthfirst-oriented graph processing engine, scouting queries typically execute for a brief amount of time with negligible overhead. Partial results can be reused to avoid redundant work. We evaluate scouting queries and show that they bring speedups of up to 8.7× for heavy queries, while adding low overhead for queries that do not benefit.
Towards Intelligent Application Security
Over the past 20 years we have seen application security evolve from analysing application code through Static Application Security Testing (SAST) tools, to detecting vulnerabilities in running applications via Dynamic Application Security Testing (DAST) tools. The past 10 years have seen new flavours of tools to provide combinations of static and dynamic tools via Interactive Application Security Testing (IAST), examination of the components and libraries of the software called Software Composition Analysis (SCA), protection of web applications and APIs using signature-based Web Application Firewalls (WAF), and monitoring the application and blocking attacks through Runtime Application Self Protection (RASP) techniques. The past 10 years has also seen an increase in the uptake of the DevOps model that combines software development and operations to provide continuous delivery of high quality software. As security has become more important, the DevOps model has evolved to the DevSecOps model where software development, operations and security are all integrated. There has also been increasing usage of learning techniques, including machine learning, and program synthesis. Several tools have been developed that make use of machine learning to help developers make quality decisions about their code, tests, or runtime overhead their code produces. However, such techniques have not been applied to application security as yet. In this talk I discuss how to provide an automated approach to integrate security into all aspects of application development and operations, aided by learning techniques. This incorporates signals from the code operations and beyond, and automation, to provide actionable intelligence to developers, security analysts, operations staff, and autonomous systems. I will also consider how malware and threat intelligence can be incorporated into this model to support Intelligent Application Security in a rapidly evolving world. Bio: https://labs.oracle.com/pls/apex/f?p=94065:11:8452080560451:21 LinkedIn: https://www.linkedin.com/in/drcristinacifuentes/ Twitter: @criscifuentes
A Reachability Index for Recursive Label-Concatenated Graph Queries
Reachability queries checking the existence of a path from a source node to a target node are fundamental operators for querying and processing graph data. Current approaches for index-based evaluation of reachability queries either focus on plain reachability or constraint-based reachability with only alternation of labels. In this paper, for the first time we study the problem of index-based processing for recursive label- concatenated reachability queries, referred to as RLC queries. These queries check the existence of a path that can satisfy the constraint defined by a concatenation of at most k edge labels under the Kleene plus. Many practical graph database and network analysis applications exhibit RLC queries. However, their evaluation remains prohibitive in current graph database engines. We introduce the RLC index, the first reachability index to efficiently process RLC queries. The RLC index checks whether the source vertex can reach an intermediate vertex that can also reach the target vertex under a recursive label-concatenated constraint. We propose an indexing algorithm to build the RLC index, which guarantees the soundness and the completeness of query execution and avoids recording redundant index entries. Comprehensive experiments on real-world graphs show that the RLC index can significantly reduce both the offline processing cost and the memory overhead of transitive closure, while improving query processing up to six orders of magnitude over online traversals. Finally, our open-source implementation of the RLC index significantly outperforms current mainstream graph engines for evaluating RLC queries.
Oracle AutoMLx
This presentation introduces Oracle Labs' AutoMLx package to an audience of university students.
AutoML on the Half Shell:How are our Oysters?
This is a presentation to be given at Analytics and Data Summit 2023 (Redwood Shores, CA, March 14, 2023). It combines two public talks from CloudWorld 2022: 1. General AutoMLx overview 2. Specific ML use-case where Oyster dataset (in collaboration w/ University of New Orleans) is used to showcase AutoML in Oracle Machine Learning (OML). The task is to predict health risk towards oysters. We are allowed to use the dataset as we have a signed DUA between Oracle and University of New Orleans.
Presentation of Prognostic and Health Management System in AeroConf 2023
Oracle has an anomaly detection solution for monitoring time-series telemetry signals for dense-sensor IoT prognostic applications. It integrates an advanced prognostic pattern recognition technique called Multivariate State Estimation Technique (MSET) for high-sensitivity prognostic fault monitoring applications in commercial nuclear power and aerospace applications. MSET has since been spun off and met with commercial success for prognostic Machine Learning (ML) applications in a broad range of safety critical applications, including NASA space shuttles, oil-and-gas asset prognostics, and commercial aviation streaming prognostics. MSET proves to possess significant advantages over conventional ML solutions including neural networks, autoassociative kernel regression, and support vector machines. The main advantages include earlier warning of incipient anomalies in complex time-series signatures, and much lower overhead compute cost due to the deterministic mathematical structure of MSET. Both are crucial for dense-sensor avionic IoT prognostics. In addition, Oracle has developed an extensive portfolio of data preprocessing innovations around MSET to solve the common big-data challenges that cause conventional ML algorithms to perform poorly regarding prognostic accuracy (i.e, false/missed alarm probabilities). Oracle's MSET-based prognostic solution helps increase avionic reliability margins and system availability objectives while reducing costly sources of “no fault found” events that have become a significant sparing-logistics issue for many industries including aerospace and avionics. Moreover, by utilizing and correlating information from all on-board telemetry sensors (e.g., distributed pressure, voltage, temperature, current, airflow and hydraulic flow), MSET is able to provide the best possible prediction of failure precursors and onset of small degradation for the electronic components used on aircrafts, benefiting the aviation Prognostics and Health Management (PHM) system.
Smoothing Entailment Graphs with Language Models
The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity when learning Entailment Graphs. As symbolic models for natural language inference, an EG cannot recover if missing a novel premise or hypothesis at test-time. In this paper we approach the problem of vertex sparsity by introducing a new method of graph smoothing, using a Language Model to find the nearest approximations of missing predicates. We improve recall by 25.1 and 16.3 absolute percentage points on two difficult directional entailment datasets while exceeding average precision, and show a complementarity with other improvements to edge sparsity. On an extrinsic QA task, we show that smoothing benefits the lower-resource questions, those with less available context. We further analyze language model embeddings and discuss why they are naturally suitable for premise-smoothing, but not hypothesis smoothing. Finally, we formalize a theory for smoothing a symbolic inference method by constructing transitive chains to smooth both the premise and hypothesis.
Introduction to graph processing with PGX (guest lecture at ENSIMAG)
Graph processing is already an integral part of big-data analytics, mainly because graphs can naturally represent data that capture fine-grained relationships among entities. Graph analysis can provide valuable insights about such data by examining these relationships. In this presentation, we will first introduce the concept of graphs and illustrate why and how graph processing can be a valuable tool for data scientists. We will then describe the differences between graph analytics/algorithms (such as Pagerank [1]) and graph queries (such as `(:person)-[:friend]->(:person)`). Second, we will summarize the different tools and technologies included in our Oracle Labs PGX [2] project and show how they provide efficient solutions to the main graph-processing problems. Finally, we will describe a few current and future directions in graph processing, including graph machine learning and distributed graphs (that could potentially lead to great topics for internships).
Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle
Many popular machine learning models scale poorly when deployed on CPUs. In this paper we explore the reasons why and propose a simple, yet effective approach based on the well-known Divide-and-Conquer Principle to tackle this problem of great practical importance. Given an inference job, instead of using all available computing resources (i.e., CPU cores) for running it, the idea is to break the job into independent parts that can be executed in parallel, each with the number of cores according to its expected computational cost. We implement this idea in the popular OnnxRuntime framework and evaluate its effectiveness with several use cases, including the well-known models for optical character recognition (PaddleOCR) and natural language processing (BERT).
Exploring topic models to discern zero-day vulnerabilities on Twitter through a case study on log4shell
Twitter has demonstrated advantages in providing timely information about zero-day vulnerabilities and exploits. The large volume of unstructured tweets, on the other hand, makes it difficult for cybersecurity professionals to perform manual analysis and investigation into critical cyberattack incidents. To improve the efficiency of data processing on Twitter, we propose a novel vulnerability discovery and monitoring framework that can collect and organize unstructured tweets into semantically related topics with temporal dynamic patterns. Unlike existing supervised machine learning methods that process tweets based on a labelled dataset, our framework is unsupervised, making it better suited for analyzing emerging cyberattack and vulnerability incidents when no prior knowledge is available (e.g., zero-day vulnerability and incidents). The proposed framework compares three topic modeling techniques(Latent Dirichlet Allocation, Non-negative Matrix Factorization and Contextualized Topic Modeling) in combination of different text representation methods (Bag-of-word and contextualized pre-trained language models) on a Twitter dataset that was collected from 47 influential users in the cybersecurity community. We show how the proposed framework can be used to analyze a critical zero-day vulnerability incident(Log4shell) on Apache log4j java library in order to understand its temporal evolution and dynamic patterns across its vulnerability life-cycle. Results show that our proposed framework can be used to effectively analyze vulnerability related topics and their dynamic patterns. Twitter can reveal valuable information regarding the early indicator of exploits and users behaviors. The pre-trained contextualized text representation shows advantages for the unstructured, domain dependent, sparse Twitter textual data under the cybersecurity domain
Distributed Graph Processing with PGX.D (2022)
Graph processing is one of the top data analytics trends. In particular, graph processing comprises two main styles of analysis, namely graph algorithms and graph pattern-matching queries. Classic graph algorithms, such as Pagerank, repeatedly traverse the vertices and edges of the graph and calculate some desired (mathematical) function. Graph queries enable the interactive exploration and pattern matching of graphs. For example, queries like `SELECT p1.name, p2.name FROM MATCH (p1:person)-[:friend]->(p2:person) WHERE p1.country = p2.country` combine the classic operations found in SQL with graph patterns. Both algorithms and queries are very challenging workloads, especially in a distributed setting, where very large graphs are partitioned across multiple machines. In this lecture, I will present how the distributed PGX [1] engine (known as PGX.D; developed at Oracle Labs [2] Zurich) implements efficient algorithms and queries and solves problems, such as data skew and intermediate-result explosion. In brief, for graph algorithms, PGX.D offers the functionality to compile simple sequential textbook-style GreenMarl [3] algorithms to efficient distributed execution. For queries, PGX.D includes a depth-first asynchronous computation runtime [4] that enables limiting the amount of intermediate data during query execution to essentially support "any-size" patterns. [1] http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html [2] https://labs.oracle.com [3] Green-Marl: A DSL for easy and efficient graph analysis, ASPLOS'12. [4] aDFS: An Almost Depth-First-Search Distributed Graph-Querying System. USENIX ATC'21.
EMNLP'22 Presentation of Proxy Clean Work: Mitigating Bias by Proxy in Pre-Trained Models
Transformer-based pre-trained models are known to encode societal biases, not only in their contextual representations but also in their downstream predictions when fine-tuned on task-specific data. We present D-BIAS, an approach that selectively eliminates stereotypical associations (e.g, co-occurrence statistics) at fine-tuning, such that the model doesn’t learn to excessively rely on those signals. D-BIAS attenuates biases from both identity words and frequently co-occurring proxies, which we select using pointwise mutual information. We apply D-BIAS to a) occupation classification, and b) toxicity classification and find that our approach substantially reduces downstream biases (> 60% in toxicity classification for iden- tities that are most frequently flagged as toxic on online platforms). In addition, we show that D-BIAS dramatically improves upon scrubbing, i.e., removing only the identity words in question. We also demonstrate that D-BIAS easily extends to multiple identities and achieves competitive performance with two recently proposed debiasing approaches: R-LACE and INLP.
Feeling Validated: Constructing Validation Sets for Few-Shot Intent Classification
We study validation set construction via data augmentation in true few-shot intent classification. Empirically, we demonstrate that with scarce data, model selection via a moderate number of generated examples consistently leads to higher test set accuracy than either model selection via a small number fo held out training examples, or selection of the model with the lowest training loss. For each of these methods of model selection -- including validation sets built from task-agnostic data augmentation -- validation accuracy provides a significant overestimate of test set accuracy. To support better estimates and effective model selection, we propose PanGeA, a generated method for domain-specific augmentation that is trained once on out-of-domain data, and then employed for augmentation for any domain-specific dataset. In experiments with 6 datasets that have been subsampled to both 5 and 10 examples per class, we show that PanGeA is better than or competitive with other methods in terms of model selection while also facilitating higher fidelity estimates of test set accuracy.
Feeling Validated: Constructing Validation Sets for Few-Shot Learning
We study validation set construction via data augmentation in true few-shot text classification. Empirically, we show that task-agnostic methods---known to be ineffective for improving test set accuracy for state-of-the-art models when used to augment the training set---are effective for model selection when used to build validation sets. However, test set accuracy on validation sets synthesized via these techniques does not provide a good estimate of test set accuracy. To support better estimates, we propose DAugSS, a generative method for domain-specific data augmentation that is trained once on task-agnostic data and then employed for augmentation on any data set, by using provided training examples and a set of guide words as a prompt. In experiments with 6 data sets, both 5 and 10 examples per class, training the last layer weights and full fine-tuning, and the choice of 4 continuous-valued hyperparameters, DAugSS is better than or competitive with other methods of validation set construction, while also facilitating better estimates of test set accuracy.
A Multi-Target, Multi-Paradigm DSL Compiler for Algorithmic Graph Processing
Domain-specific language compilers need to close the gap between the domain abstractions of the language and the low-level concepts of the target platform. This can be challenging to achieve for compilers targeting multiple platforms with potentially very different computing paradigms. In this paper, we present a multi-target, multi-paradigm DSL compiler for algorithmic graph processing. Our approach centers around an intermediate representation and reusable, composable transformations to be shared between the different compiler targets. These transformations embrace abstractions that align closely with the concepts of a particular target platform, and disallow abstractions that are semantically more distant. Our compiler supports four different target platforms, each involving a different computing paradigm. We report on our experience implementing the compiler and highlight some of the challenges and requirements for applying language workbenches in industrial use cases.
Subject Level Differential Privacy with Hierarchical Gradient Averaging
Subject Level Differential Privacy (DP) is a granularity of privacy recently studied in the Federated Learning (FL) setting, where a subject is defined as an individual whose private data is embodied by multiple data records that may be distributed across a multitude of federation users. This granularity is distinct from item level and user level privacy appearing in the literature. Prior work on subject level privacy in FL focuses on algorithms that are derivatives of group DP or enforce user level Local DP (LDP). In this paper, we present a new algorithm – Hierarchical Gradient Averaging (HiGradAvgDP) – that achieves subject level DP by constraining the effect of individual subjects on the federated model. We prove the privacy guarantee for HiGradAvgDP and empirically demonstrate its effectiveness in preserving model utility on the FEMNIST and Shakespeare datasets. We also report, for the first time, a unique problem of privacy loss composition, which we call horizontal composition, that is relevant only to subject level DP in FL. We show how horizontal composition can adversely affect model utility by either in- creasing the noise necessary to achieve the DP guarantee, or by constraining the amount of training done on the model.
Private and Robust Federated Learning using Private Information Retrieval and Norm Bounding
Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that addresses both these problems by combining norm bounding for model robustness with a novel intra-model parameter shuffling technique that amplifies data privacy by means of Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients’ model updates. The combination of these techniques helps the federation server constrain parameter updates from clients so as to curtail effects of model poisoning attacks by adversarial clients. We further present FedPerm’s unique hyperparameters that can be used effectively to trade off computation overheads with model utility. Our empirical evaluation on the MNIST dataset demonstrates FedPerm’s effectiveness over existing Differential Privacy (DP) enforcement solutions in FL.
Machine Learning in Java
An overview of Java and Machine Learning, covering why you might want to write ML applications in more structured languages, what ML tools are available in the Java ecosystem, and some of the recent preview features in the JDK which improve numerical performance.
Automatically Deriving JavaScript Static Analyzers from Specifications using Meta-Level Static Analysis
JavaScript is one of the most dominant programming languages. However, despite its popularity, it is a challenging task to correctly understand the behaviors of JavaScript programs because of their highly dynamic nature. Researchers have developed various static analyzers that strive to conform to ECMA-262, the standard specification of JavaScript. Unfortunately, all the existing JavaScript static analyzers require manual updates for new language features. This problem has become more critical since 2015 because the JavaScript language itself rapidly evolves with a yearly release cadence and open development process. In this paper, we present JSAVER, the first tool that automatically derives JavaScript static analyzers from language specifications. The main idea of our approach is to extract a definitional interpreter from ECMA-262 and perform a meta-level static analysis with the extracted interpreter. A meta-level static analysis is a novel technique that indirectly analyzes programs by analyzing a definitional interpreter with the programs. We also describe how to indirectly configure abstract domains and analysis sensitivities in a meta-level static analysis. For evaluation, we derived a static analyzer from the latest ECMA-262 (ES12, 2021) using JSAVER. The derived analyzer soundly analyzed all applicable 18,556 official conformance tests with 99.0% of precision in 590 ms on average. In addition, we demonstrate the configurability and adaptability of JSAVER with several case studies.
ESEC/FSE'22 presentation: Automatically Deriving JavaScript Static Analyzers from Specifications using Meta-Level Static Analysis
JavaScript is one of the most dominant programming languages. However, despite its popularity, it is a challenging task to correctly understand the behaviors of JavaScript programs because of their highly dynamic nature. Researchers have developed various static analyzers that strive to conform to ECMA-262, the standard specification of JavaScript. Unfortunately, all the existing JavaScript static analyzers require manual updates for new language features. This problem has become more critical since 2015 because the JavaScript language itself rapidly evolves with a yearly release cadence and open development process. In this paper, we present JSAVER, the first tool that automatically derives JavaScript static analyzers from language specifications. The main idea of our approach is to extract a definitional interpreter from ECMA-262 and perform a meta-level static analysis with the extracted interpreter. A meta-level static analysis is a novel technique that indirectly analyzes programs by analyzing a definitional interpreter with the programs. We also describe how to indirectly configure abstract domains and analysis sensitivities in a meta-level static analysis. For evaluation, we derived a static analyzer from the latest ECMA-262 (ES12, 2021) using JSAVER. The derived analyzer soundly analyzed all applicable 18,556 official conformance tests with 99.0% of precision in 590 ms on average. In addition, we demonstrate the configurability and adaptability of JSAVER with several case studies.
Property Graph Support in Relational Database
Presentation to Data Community Conference Switzerland 2022 about the Property Graph feature in Oracle DB 23c.
Industrial Strength Static Detection for Cryptographic API Misuses
We describe our experience of building an industrial-strength cryptographic vulnerability detector, which aims to detect cryptographic API misuses in Java(TM). Based on the detection algorithms of the CryptoGuard, we integrated the detection into the Oracle internal code scanning platform Parfait. The goal of the Parfait-based cryptographic vulnerability detection is to provide precise and scalable cryptographic code screening for large-scale industrial projects. We discuss the needs and challenges of the static cryptographic vulnerability screening in the industrial environment.
Analysing Temporality in General-Domain Entailment Graphs
Entailment Graphs based on open relation extraction run the risk of learning spurious entailments (e.g. win against ⊨ lose to) from antonymous predications that are observed with the same entities referring to different times. Previous research has demonstrated the potential of using temporality as a signal to avoid learning these entailments in the sports domain. We investigate whether this extends to the general news domain. Our method introduces a temporal window that is set dynamically for each eventuality using a temporally-informed language model. We evaluate our models on a sports-specific dataset, and ANT – a novel general-domain dataset based on Word-Net antonym pairs. We find that whilst it may be useful to reinterpret the Distributional Inclusion Hypothesis to include time for the sports news domain, this does not apply to the general news domain.
RASPunzel: A Novel RASP Solution
This document presents an overview of project RASPunzel. It highlights the approach of using an allowlist (instead of a deny list) and summarises the key advantages.
TruffleTaint: Polyglot Dynamic Taint Analysis on GraalVM
Dynamic taint analysis tracks the propagation of specific values while a program executes . To this end, a taint label is attached to these values and dynamically propagated to any values derived from them. Frequent application of this analysis technique in many fields has led to the development of general purpose analysis platforms with taint propaga- tion capabilities. However, these platforms generally limit analysis developers to a specific implementation language, propagation semantics or taint label representation, and they provide no tooling support for analysis development. In this paper we present a language-agnostic approach for implementing a dynamic taint analysis independently of the analysis platform that it is executed on. We imple- mented this approach in TruffleTaint, a platform for taint propagation in multiple programming languages. We show how our approach enables TruffleTaint to provide analysis implementers with more control over the semantics and im- plementation language of their taint analysis than current analysis platforms and with a more capable development en- vironment. We further show that our approach enables the development of both tooling infrastructure for taint analysis research and data-flow enabled tools for end-users.
ML-SOCO: Machine Learning-Based Self-Optimizing Compiler Optimizations
Compiler optimizations often involve hand-crafted heuris- tics to guide the optimization process. These heuristics are designed to benefit the average program and are otherwise static or only customized by profiling information. We pro- pose machine learning-based self optimizing compiler op- timizations (ML-SOCO), a novel approach for fitting opti- mizations in a dynamic compiler to a specific environment. ML-SOCO explores—at run time—the impact of optimization decisions and uses this data to train or update a machine learning model. Related work which has primarily targeted static compilers has already shown that machine learning can outperform human-crafted heuristics. Our approach is specifically tailored to dynamic compilation and uses con- cepts like deoptimization for transparently switching be- tween generating data and performing machine learning decisions during compilation. We implemented ML-SOCO in the GraalVM compiler which is one of the most highly optimizing Java compilers on the market. When evaluat- ing ML-SOCO by replacing a loop peeling heuristics with a learned model we encountered multiple speedups larger than 30% in established benchmarks. Apart from improving the performance, ML-SOCO can also be used to assist compiler engineers when improving heuristics for specific domains.
Automatic Array Transformation to Columnar Storage at Run Time
Today’s huge memories make it possible to store and process large data structures in memory instead of in a database. Hence, accesses to this data should be optimized, which is normally relegated either to the runtimes and compilers or is left to the developers, who often lack the knowledge about optimization strategies. As arrays are often part of the language, developers frequently use them as an underlying storage mechanism. Thus, optimization of arrays may be vital to improve performance of data-intensive applications. While compilers can apply numerous optimizations to speed up accesses, it would also be beneficial to adapt the actual layout of the data in memory to improve cache utilization. However, runtimes and compilers typically do not perform such memory layout optimizations. In this work, we present an approach to dynamically per- form memory layout optimizations on arrays of objects to transform them into a columnar memory layout, a storage layout frequently used in analytical applications that enables faster processing of read-intensive workloads. By integration into a state-of-the-art JavaScript runtime, our approach can speed up queries for large workloads by up to 7x, where the initial transformation overhead is amortized over time.
Efficient Property Projections of Graph Queries over Relational Data
Specialized graph data management systems have made significant advances in storing and analyzing graph-structured data. However, a large fraction of the data of interest still resides in relational database systems (RDBMS) due to their maturity and security reasons. Recent studies, in view of composability, show that the execution of graph queries over relational databases, (i.e., a graph layer on top of RDBMS), can provide competitive performance compared to specialized graph databases. While using the standard property graph model for graph querying, one of the main bottlenecks for efficient query processing, under memory constraints, is property projections, i.e., to project properties of nodes along paths matching a given pattern. This is because graph queries produce a large number of matching paths, resulting in a lot of requests to the data storage or a large memory footprint, to access their properties. In this paper, we propose a set of novel techniques exploiting the inherent structure of the graph (aka, a graph projection cache manager) to provide efficient property projections. The controlled memory footprint of our solution makes it practical in multi-tenant database deployments. The empirical results on a social graph show that our solution reduce the number of accesses to the data storage by more than an order of magnitude, resulting in graph queries being up to 3.1X faster than the baseline.
Proof Engineering with Predicate Transformer Semantics
We present a lightweight, open source Agda framework for manually verifying effectful programs using predicate transformer semantics. We represent the abstract syntax trees (AST) of effectful programs with a generalized algebraic datatype (GADT) AST, whose generality enables even complex operations to be primitive AST nodes. Users can then assign bespoke predicate transformers to such operations to aid the proof effort, for example by automatically decomposing proof obligations for branching code. Our framework codifies and generalizes a proof engineering methodology used by the authors to reason about a prototype implementation of LibraBFT, a Byzantine fault tolerant consensus protocol in which code executed by participants may have effects such as updating state and sending messages. Successful use of our framework in this context demonstrates its practical applicability.
FedPerm: Private and Robust Federated Learning by Parameter Permutation
Federated Learning (FL) is a distributed learning paradigm that enables mutually untrusting clients to collaboratively train a common machine learning model. Client data privacy is paramount in FL. At the same time, the model must be protected from poisoning attacks from adversarial clients. Existing solutions address these two problems in isolation. We present FedPerm, a new FL algorithm that addresses both these problems by combining a novel intra-model parameter shuffling technique that amplifies data privacy, with Private Information Retrieval (PIR) based techniques that permit cryptographic aggregation of clients’ model updates. The combination of these techniques further helps the federation server constrain parameter updates from clients so as to cur- tail effects of model poisoning attacks by adversarial clients. We further present FedPerm’s unique hyperparameters that can be used effectively to trade off computation overheads with model utility. Our empirical evaluation on the MNIST dataset demonstrates FedPerm’s effectiveness over existing Differential Privacy (DP) enforcement solutions in FL.
Experimental Procedures for Exploiting Structure in AutoML Loss Landscapes
Recent observations regarding the structural simplicity of algorithm configuration landscapes have spurred the development of new configurators that obtain provably and empirically better performance. Inspired by these observations, we recently performed a similar analysis of AutoML Loss Landscapes – that is, the relationship between hyper-parameter configurations and machine learning model performance. In this study, we propose two new variations of an existing, state-of-the-art hyper-parameter configuration procedure. We designed each method to exploit a specific property that we observed common among most AutoML loss landscapes; however, we demonstrate that neither are competitive with existing baselines. In light of this result, we construct artificial algorithm configuration scenarios that allow us to show when the two new methods can be expected to outperform their baselines and when they cannot, thereby providing additional insights into AutoML loss landscapes.
N-1 Experts: Unsupervised Anomaly Detection Model Selection
Manually finding the best combination of machine learning training algorithm, model and hyper-parameters can be challenging. In supervised settings, this burden has been alleviated with the introduction of automated machine learning (AutoML) methods. However, similar methods are noticeably absent for fully unsupervised applications, such as anomaly detection. We introduce one of the first such methods, N-1 Experts, which we compare to a recent state-of-the-art baseline, MetaOD, and show favourable performance.
N-1 Experts: Unsupervised Anomaly Detection Model Selection
Manually finding the best combination of machine learning training algorithm, model and hyper-parameters can be challenging. In supervised settings, this burden has been alleviated with the introduction of automated machine learning (AutoML) methods. However, similar methods are noticeably absent for fully unsupervised applications, such as anomaly detection. We introduce one of the first such methods, N-1 Experts, which we compare to a recent state-of-the-art baseline, MetaOD, and show favourable performance.
Distinct Value Estimation from a Sample: Statistical Methods vs. Machine Learning
Estimating the number of distinct values (NDV) in a dataset is an important operation in modern database systems for many tasks, including query optimization. In large scale systems, tables often contain billions of rows and wrong optimizer decisions can cause severe deterioration in query performance. Additionally in many situations, such as having large tables or NDV estimation after the application of filters, it is not feasible to scan the entire dataset to compute the number of distinct values. In such cases, the only available option is to use a dataset sample to estimate the NDV. This, however, is not trivial as data properties of the sample usually do not mirror the properties of the full dataset. Approaches in related work have shown that this kind of estimation is connected to large errors. In this paper, we present two novel approaches for the problem of estimating the number of distinct values from a dataset sample. Our first approach presents a novel statistical estimator that shows good and robust results across a broad range of datasets. The second approach is based on Machine Learning (ML), hence being the first time that ML is applied to this problem. Both approaches outperform the state-of-the-art, with the ML approach reducing the average error by 3x for real-world datasets. Beyond pure prediction quality, both our approaches have their own set of advantages and disadvantages, and we show that the right approach actually depends on the specific application scenario.
Pruning Networks During Training via Auxiliary Parameters
Neural networks have perennially been limited by the physical constraints of implementation on real hardware, and the desire for improved accuracy often drives the model size to the breaking point. The task of reducing the size of a neural network, whether to meet memory constraints, inference-time speed, or generalization capabilities, is therefore well-studied. In this work, we present an extremely simple scheme to reduce model size during training, by introducing auxiliary parameters to the inputs of each layer of the neural network, and a regularization penalty that encourages the network to eliminate unnecessary variables from the computation graph. Though related to many prior works, this scheme offers several advantages: it is extremely simple to implement; the network eliminates unnecessary variables as part of training, without requiring any back-and-forth between training and pruning; and it dramatically reduces the number of parameters in the networks while maintaining high accuracy.
Subject Membership Inference Attacks in Federated Learning
Privacy in Federated Learning (FL) is studied at two different granularities - item-level, which protects individual data points, and user-level, which protects each user (participant) in the federation. Nearly all of the private FL literature is dedicated to the study of privacy attacks and defenses alike at these two granularities. More recently, subject-level privacy has emerged as an alternative privacy granularity to protect the privacy of individuals whose data is spread across multiple (organizational) users in cross-silo FL settings. However, the research community lacks a good understanding of the practicality of this threat, as well as various factors that may influence subject-level privacy. A systematic study of these patterns requires complete control over the federation, which is impossible with real-world datasets. We design a simulator for generating various synthetic federation configurations, enabling us to study how properties of the data, model design and training, and the federation itself impact subject privacy risk. We propose three inference attacks for subject-level privacy and examine the interplay between all factors within a federation. Our takeaways generalize to real-world datasets like FEMNIST, giving credence to our findings.
Synthesis of Java Deserialisation Filters from Examples (Conference Video)
Java natively supports serialisation and deserialisation, features that are necessary to enable distributed systems to exchange Java objects. Deserialisation of data from malicious sources can lead to security exploits including remote code execution because by default Java does not validate deserialised data. In the absence of validation, a carefully crafted payload can trigger arbitrary functionality. The state-of-the-art general mitigation strategy for deserialisation exploits in Java is deserialisation filtering that validates the contents of an object input stream before the object is deserialised using user-provided filters. In this paper we describe a novel technique called ds-prefix for automatic synthesis of deserialisation filters (as regular expressions) from examples. We focus on synthesis of allowlists (permitted behaviours) as they provide a better level of security. Ds-prefix is based on deserialisation heuristics and specifically targets synthesis of deserialisation allowlists. We evaluate our approach by executing ds-prefix on popular open-source systems and show that ds-prefix can produce filters preventing real CVEs using a small number of training examples. We also compare our approach with other synthesis tools which demonstrates that ds-prefix outperforms existing tools and achieves better precision.
Synthesis of Java Deserialisation Filters from Examples
Java natively supports serialisation and deserialisation, features that are necessary to enable distributed systems to exchange Java objects. Deserialisation of data from malicious sources can lead to security exploits including remote code execution because by default Java does not validate deserialised data. In the absence of validation, a carefully crafted payload can trigger arbitrary functionality. The state-of-the-art general mitigation strategy for deserialisation exploits in Java is deserialisation filtering that validates the contents of an object input stream before the object is deserialised using user-provided filters. In this paper we describe a novel technique called ds-prefix for automatic synthesis of deserialisation filters (as regular expressions) from examples. We focus on synthesis of allowlists (permitted behaviours) as they provide a better level of security. Ds-prefix is based on deserialisation heuristics and specifically targets synthesis of deserialisation allowlists. We evaluate our approach by executing ds-prefix on popular open-source systems and show that ds-prefix can produce filters preventing real CVEs using a small number of training examples. We also compare our approach with other synthesis tools which demonstrates that ds-prefix outperforms existing tools and achieves better precision.
Synthesis of Java Deserialisation Filters from Examples (Presentation Slides)
Java natively supports serialisation and deserialisation, features that are necessary to enable distributed systems to exchange Java objects. Deserialisation of data from malicious sources can lead to security exploits including remote code execution because by default Java does not validate deserialised data. In the absence of validation, a carefully crafted payload can trigger arbitrary functionality. The state-of-the-art general mitigation strategy for deserialisation exploits in Java is deserialisation filtering that validates the contents of an object input stream before the object is deserialised using user-provided filters. In this paper we describe a novel technique called ds-prefix for automatic synthesis of deserialisation filters (as regular expressions) from examples. We focus on synthesis of allowlists (permitted behaviours) as they provide a better level of security. Ds-prefix is based on deserialisation heuristics and specifically targets synthesis of deserialisation allowlists. We evaluate our approach by executing ds-prefix on popular open-source systems and show that ds-prefix can produce filters preventing real CVEs using a small number of training examples. We also compare our approach with other synthesis tools which demonstrates that ds-prefix outperforms existing tools and achieves better F1-score.
ONNX and the JVM
Integrating machine learning into enterprises requires building and deploying ML models in the environments enterprises build their software in. Frequently this is in Java, or another language running on the JVM. In this talk we'll cover some of our recent work bringing the ONNX ecosystem to Java. We'll discuss uses of ONNX Runtime from Java, and also our work writing model converters from our Java ML library into ONNX format.
Experience: Model-Based, Feedback-Driven, Greybox Web Fuzzing with BackREST
Slides for the corresponding ECOOP 2022 paper.
Subject Granular Differential Privacy in Federated Learning
This paper introduces subject granular privacy in the Federated Learning (FL) setting, where a subject is an individual whose private information is embodied by several data items either confined within a single federation user or distributed across multiple federation users. We formally define the notion of subject level differential privacy for FL. We propose three new algorithms that enforce subject level DP. Two of these algorithms are based on notions of user level local differential privacy (LDP) and group differential privacy respectively. The third algorithm is based on a novel idea of hierarchical gradient averaging (HiGradAvgDP) for subjects participating in a training mini-batch. We also introduce horizontal composition of privacy loss for a subject across multiple federation users. We show that horizontal composition is equivalent to sequential composition in the worst case. We prove the subject level DP guarantee for all our algorithms and empirically analyze them using the FEMNIST and Shakespeare datasets. Our evaluation shows that, of our three algorithms, HiGradAvgDP delivers the best model performance, approaching that of a model trained using a DP-SGD based algorithm that provides a weaker item level privacy guarantee.
Distinct Value Estimation from a Sample: Statistical Methods vs. Machine Learning
Estimating the number of distinct values (NDV) in a dataset is an important operation in modern database systems for many tasks, including query optimization. In large scale systems, tables often contain billions of rows and wrong optimizer decisions can cause severe deterioration in query performance. Additionally in many situations, such as having large tables or NDV estimation after the application of filters, it is not feasible to scan the entire dataset to compute the number of distinct values. In such cases, the only available option is to use a dataset sample to estimate the NDV. This, however, is not trivial as data properties of the sample usually do not mirror the properties of the full dataset. Approaches in related work have shown that this kind of estimation is connected to large errors. In this paper, we present two novel approaches for the problem of estimating the number of distinct values from a dataset sample. Our first approach presents a novel statistical estimator that shows good and robust results across a broad range of datasets. The second approach is based on Machine Learning (ML), hence being the first time that ML is applied to this problem. Both approaches outperform the state-of-the-art, with the ML approach reducing the average error by 3x for real-world datasets. Beyond pure prediction quality, both our approaches have their own set of advantages and disadvantages, and we show that the right approach actually depends on the specific application scenario.
Automatic Root Cause Quantification for Missing Edges in JavaScript Call Graphs
Building sound and precise static call graphs for real-world JavaScript applications poses an enormous challenge, due to many hard-to-analyze language features. Further, the relative importance of these features may vary depending on the call graph algorithm being used and the class of applications being analyzed. In this paper, we present a technique to automatically quantify the relative importance of different root causes of call graph unsoundness for a set of target applications. The technique works by identifying the dynamic function data flows relevant to each call edge missed by the static analysis, correctly handling cases with multiple root causes and inter-dependent calls. We apply our approach to perform a detailed study of the recall of a state-of-the-art call graph construction technique on a set of framework-based web applications. The study yielded a number of useful insights. We found that while dynamic property accesses were the most common root cause of missed edges across the benchmarks, other root causes varied in importance depending on the benchmark, potentially useful information for an analysis designer. Further, with our approach, we could quickly identify and fix a recall issue in the call graph builder we studied, and also quickly assess whether a recent analysis technique for Node.js-based applications would be helpful for browser-based code. All of our code and data is publicly available, and many components of our technique can be re-used to facilitate future studies.
Experience: Model-Based, Feedback-Driven, Greybox Web Fuzzing with BackREST
Following the advent of the American Fuzzy Lop (AFL), fuzzing had a surge in popularity, and modern day fuzzers range from simple blackbox random input generators to complex whitebox concolic frameworks that are capable of deep program introspection. Web application fuzzers, however, did not benefit from the tremendous advancements in fuzzing for binary programs and remain largely blackbox in nature. In this experience paper, we show how techniques like state-aware crawling, type inference, coverage and taint analysis can be integrated with a black-box fuzzer to find more critical vulnerabilities, faster (speedups between 7.4x and 25.9x). Comparing BackREST against three other web fuzzers on five large ($>$500 KLOC) Node.js applications shows how it consistently achieves comparable coverage while reporting more vulnerabilities than state-of-the-art. Finally, using BackREST, we uncovered eight 0-days, out of which six were not reported by any other fuzzer. All the 0-days have been disclosed and most are now public, including two in the highly popular Sequelize and Mongodb libraries.
Anomaly Detection for Cybersecurity and the Need for Explainable AI
Machine learning is increasingly applied in the cybersecurity domain in order to build solutions capable of protecting against attacks that escape rule-based systems. Attacks are nowadays constantly evolving, since adversaries are always creating new approaches or tweaking existing ones: it is thus not possible to rely exclusively on supervised techniques. This talk will focus on the role of anomaly detection techniques in real-world security applications, and how explainability is necessary in order to translate the anomalies detected by the system into actionable events.
AutoML Loss Landscapes
As interest in machine learning and its applications continues to increase, how to choose the best models and hyper-parameter settings becomes more important. This problem is known to be challenging for human experts, and consequently, a growing number of methods have been proposed for solving it, giving rise to the area of automated machine learning (AutoML). Many of the most popular AutoML methods are based on Bayesian optimization, which makes only weak assumptions about how modifying hyper-parameters effects the loss of a model. This is a safe assumption that yields robust methods, as the AutoML loss landscapes that relate hyper-parameter settings to loss are poorly understood. We build on recent work on the study of one-dimensional slices of algorithm configuration landscapes by introducing new methods that test 𝑛- dimensional landscapes for statistical deviations from uni-modality and convexity, and we use them to show that a diverse set of AutoML loss landscapes are highly structured. We introduce a method for assessing the significance of hyper-parameter partial derivatives, which reveals that most (but not all) AutoML loss landscapes have only a small number of hyper-parameters that interact strongly. To further assess hyper- parameter interactions, we introduce a simplistic optimization procedure that assumes each hyper-parameter can be optimized independently, a single time in sequence, and we show that it obtains configurations that are statistically tied with optimal in all of the 𝑛-dimensional AutoML loss landcsapes that we studied. Our results suggest many possible new directions for substantially improving the state of the art in AutoML.
Towards Formal Verification of HotStuff-based Byzantine Fault Tolerant Consensus in Agda
LibraBFT is a Byzantine Fault Tolerant (BFT) consensus protocol based on HotStuff. We present an abstract model of the pro- tocol underlying HotStuff / LibraBFT, and formal, machine-checked proofs of their core correctness (safety) property and an extended condition that enables non-participating parties to verify committed results. (Liveness properties would be proved for specific implementations, not for the abstract model presented in this paper.) A key contribution is precisely defining assumptions about the behavior of honest peers, in an abstract way, independent of any particular implementation. Therefore, our work is an important step towards proving correctness of an entire class of concrete implementations, without repeating the hard work of proving correctness of the underlying protocol. The abstract proofs are for a single configuration (epoch); extending these proofs across configuration changes is future work. Our models and proofs are expressed in Agda, and are available in open source.
Runtime Prevention of Deserialization Attacks
Untrusted deserialization exploits, where a serialised object graph is used to achieve denial-of-service or arbitrary code execution, have become so prominent that they were introduced in the 2017 OWASP Top 10. In this paper, we present a novel and lightweight approach for runtime prevention of deserialization attacks using Markov chains. The intuition behind our work is that the features and ordering of classes in malicious object graphs make them distinguishable from benign ones. Preliminary results indeed show that our approach achieves an F1-score of 0.94 on a dataset of 264 serialised payloads, collected from an industrial Java EE application server and a repository of deserialization exploits.
An approach to translating Haskell programs to Agda and reasoning about them
We are using the Agda programming language and proof assistant to formally verify correctness of a Byzantine Fault Tolerant consensus implementation based on HotStuff / DiemBFT. The Agda implementation is a translation of our Haskell implementation, which is based on DiemBFT. This short paper focuses on one aspect of this work. We have developed a library that enables the translated Agda implementation to closely mirror the Haskell code on which it is based, making review and maintenance easier and more efficient, and reducing the risk of translation errors. We also explain how we assign semantics to the syntactic features provided by our library, thus enabling formal reasoning about programs that use them; details of how we reason about the resulting Agda implementation will be presented in a future paper. The library we present is independent of our particular verification project, and is available in open source for others to use and extend.
Towards formal verification of HotStuff-based BFT consensus in Agda
LibraBFT is a Byzantine Fault Tolerant (BFT) consensus protocol based on HotStuff. We present an abstract model of the protocol underlying HotStuff/LibraBFT, and formal, machine-checked proofs of their core correctness (safety) property and an extended condition that enables non-participating parties to verify committed results. (Liveness properties would be proved for specific implementations, not for the abstract model presented in this paper.) A key contribution is precisely defining assumptions about the behavior of honest peers, in an abstract way, independent of any particular implementation. Therefore, our work is an important step towards proving correctness of an entire class of concrete implementations, without repeating the hard work of proving correctness of the underlying protocol. The abstract proofs are for a single configuration (epoch); extending these proofs across configuration changes is future work. Our models and proofs are expressed in Agda, and are available in open source.
Upstream Mitigation Is Not All You Need: Testing the Bias Transfer Hypothesis in Pre-Trained Language Models
A few large, homogenous, pre-trained models undergird many machine learning systems — and often, these models contain harmful stereotypes learned from the internet. We investigate the bias transfer hypothesis: the theory that social biases (such as stereotypes) internalized by large language models during pre-training transfer into harmful task-specific behavior after fine-tuning. For two classification tasks, we find that reducing intrin- sic bias with controlled interventions before fine- tuning does little to mitigate the classifier’s dis- criminatory behavior after fine-tuning. Regression analysis suggests that downstream disparities are better explained by biases in the fine-tuning dataset. Still, pre-training plays a role: simple alterations to co-occurrence rates in the fine-tuning dataset are ineffective when the model has been pre-trained. Our results encourage practitioners to focus more on dataset quality and context-specific harms.
Upstream Mitigation Is Not All You Need
A few large, homogenous pre-trained models undergird many machine learning systems — and often, these models contain harmful stereotypes learned from the internet. We investigate the bias transfer hypothesis, the possibility that social biases (such as stereotypes) internalized by large language models during pre-training could also affect task-specific behavior after fine-tuning. For two classification tasks, we find that reducing intrinsic bias with controlled interventions before fine-tuning does little to mitigate the classifier’s discriminatory behavior after fine-tuning. Regression analysis suggests that downstream disparities are better explained by biases in the fine-tuning dataset. Still, pre-training plays a role: simple alterations to co-occurrence rates in the fine-tuning dataset are ineffective when the model has been pre-trained. Our results encourage practitioners to focus more on dataset quality and context-specific harms.
Oracle Cloud Advanced ML Prognostics Innovations for Enterprise Computing Servers
Oracle has a portfolio of Machine Learning (ML) offerings for monitoring time-series telemetry signals for anomaly detection. The product suite is called the Multivariate State Estimation Technique (MSET2) that integrates an advanced prognostic pattern recognition technique with a collection of intelligent data preprocessing (IDP) innovations for high-sensitivity prognostic applications. One of the important application is monitoring dynamic computer power and catching the early incipience of mechanisms that cause servers to fail using the telemetry signals of servers. Telemetry signals in computing servers typically include many physical variables (e.g., voltages, currents, temperatures, fan speeds, and power levels) that correlate with system IO traffic, memory utilization, and system throughput. By utilizing the telemetry signals, MSET2 improve power efficiencies by monitoring, reporting and forecasting energy consumption, cooling requirements and load utilization of servers. However, the common challenge in the computing server industry is that telemetry signals are never perfect. For example, enterprise-class servers have disparate sampling rates and are often not synchronized in time, resulting in a lead-lag phase change among the various signals. In addition, the enterprise computing industry often uses 8-bit A/D conversion chips for physical sensors. This makes it difficult to discern small variations in the physical variables that are severely quantized because of the use of low-resolution chips. Moreover, missing values often exist in the streaming telemetry signals, which can be caused by the saturated system bus or data transmission error. This paper describes some features of key IDP algorithms for optimal ML solutions to the aforementioned challenges across the enterprise computing industry. It assures optimal ML performance for prognostics, optimal energy efficiency of Enterprise Servers, and streaming analytics.
Temporality in General-Domain Entailment Graph Induction
Entailment Graphs based on open relation extraction run the risk of learning spurious entailments (e.g. win against ⊨ lose to) from antonymous predications that are observed with the same entities referring to different times. Previous research has demonstrated the potential of using temporality as a signal to avoid learning these entailments in the sports domain. We investigate whether this extends to the general news domain. Our method introduces a temporal window that is set dynamically for each eventuality using a temporally informed language model. We evaluate our models on a sports-specific dataset, and ANT – a novel general-domain dataset based on Word-Net antonym pairs. We find that whilst it may be useful to reinterpret the Distributional Inclusion Hypothesis to include time for the sports news domain, this does not apply to the general news domain.
Challenges in adopting Machine Learning for Cybersecurity
Machine learning can be a powerful ally in fighting Cybercrime provided that few challenges in its application can be solved. The Keybridge team at Oracle Labs has experience with developing ML solutions for security use cases. In this talk we would like to share those experiences and discuss three challenges - selecting an ML model, handling of input data (specifically system logs) and transferring to security teams. In the latter challenge, we are particularly interested in bridging the two-way gap in understanding between security teams and ML practitioners.
Runtime Prevention of Deserialization Attacks
Untrusted deserialization exploits, where a serialised object graph is used to achieve denial-of-service or arbitrary code execution, have become so prominent that they were introduced in the 2017 OWASP Top 10. In this paper, we present a novel and lightweight approach for runtime prevention of deserialization attacks using Markov chains. The intuition behind our work is that the features and ordering of classes in malicious object graphs make them distinguishable from benign ones. Preliminary results indeed show that our approach achieves an F1-score of 0.94 on a dataset of 264 serialised payloads, collected from an industrial Java EE application server and a repository of deserialization exploits.
Constant Blinding on GraalVM
With the advent of JIT-compilers, code-injection attacks have seen a revival in the form of JIT-spraying. JIT-spraying enables an attacker to inject gadgets into executable memory, effectively bypassing W^X and ASLR. In response to JIT-spraying, constant blinding has emerged as a conceptually simple and performance friendly defense. Unfortunately, a number of increasingly sophisticated attacks has pinpointed the shortcomings of existing constant blinding implementations. In this paper, we present our constant blinding implementation for the GraalVM, taking into account the insights from the last decade regarding the security of constant blinding. We discuss important design decisions and tradeoffs as well as the practical implementation issues encountered when implementing constant blinding for GraalVM. We evaluate the performance impact of our implementation with different configurations and demonstrate its effectiveness by fuzzing for unblinded constants.
"Static Java": The GraalVM Native Image Programming Model
In this talk we will present our vision for “Static Java”: the programming model enabled by GraalVM Native Image. Applications are initialized at image build time, to allow fast startup time and low memory footprint at run time. Counterintuitively, the ahead-of-time compilation of Java bytecode to machine code is not part of the programming model. But since it is an important implementation detail, we will also talk about the benefits and problems of compiling ahead-of-time compilation. We will show where static analysis helps, what the limitations of static analysis are, which compiler optimizations work well both for JIT and AOT compilation, and where additional compiler phases for AOT compilation are necessary.
GraalVM: State of AArch64
While always the de facto choice of the mobile domain, recently machines using Arm's AArch64 ISA have also become prevalent within the laptop, desktop, and server marketplaces. Because of this, it is imperative for the GraalVM ecosystem to not only perform well on AArch64, but to treat AArch64 as an equal peer of AMD64. In my talk, I will give an overview of the current state of GraalVM on AArch64. This includes (i) describing the work involved in creating the GraalVM AArch64 port, (ii) providing an overview of current GraalVM AArch64 features, (iii) explaining the code architecture of the AArch64 backend and how to navigate it, and (iv) presenting some current performance numbers on AArch64. Beyond this overview, I also plan to discuss in detail some of the main challenges in getting AArch64 running on GraalVM, such adding patching support, abiding by the Java Memory Model, and utilizing AArch64's different addressing modes and branch instructions. I'll also present some of our future plans for the continued improvement of the AArch64 backend.
Toward Just-in-time and Language-agnostic Mutation Testing
Mutation Testing is a popular approach to determine the quality of a suite of unit tests. It is based on the idea that introducing faults into a system-under-test (SUT) should cause tests to fail, otherwise, the test suite might be of insufficient quality. In the language of mutation testing, such a fault is referred to as "mutation", and an instance of the SUT's code that contains the mutation is referred to as ``mutant''. Mutation testing is computationally expensive and time-consuming. Reasons for this include, for example, a high number of mutations to consider, interrelations between these mutations, and mutant-associated costs such as the cost of mutant creation or the cost of checking whether any tests fail in response. Furthermore, implementing a reliable tool for automatic mutation testing is a significant effort for any language. As a result, mutation testing is only available for some languages. Present mutation tools often rely on modifying code or binary executables. We refer to this as "ahead-of-time" mutation testing. Oftentimes, they neither take dynamic information that is only available at run-time into account nor alter program behavior at run-time. However, mutating via the latter could save costs on mutant creation: If the corresponding module of code is compiled, only the mutated section of code needs to be recompiled. Additional run-time information (like previous execution results of the mutated section) selected by an initial test run, could also help to determine the utility of a mutant. Skipping mutants of low utility could have an impact on mutation testing efficiency. We propose to refer to this approach as just-in-time mutation testing. In this paper, we provide a proof of concept for just-in-time and language-agnostic mutation testing. We present preliminary results of a feasibility study that explores the implementation of just-in-time mutation testing based on Truffle's instrumentation API. Based on these results, future research can evaluate the implications of just-in-time and language-agnostic mutation testing.
Autonomous Memory Sizing Formularization for Cloud-based IoT ML Customers
Machine learning IoT use cases involve thousands of sensor signals, and the demand on the cloud is high. One challenge for all cloud companies who seek to deal with big data use cases is the fact that the peak memory utilization scales non-linearly with the number of sensors, and sizing cloud shapes properly and autonomously prior to the program run is complicated. To address this issue, Oracle developed an autonomous formularization tool with OCI Anomaly Detection’s patented MSET2 algorithm so RAM capacity and/or VRAM capacity can be optimally sized—which helps developers gain a perception of the required computing resources beforehand and avoid the out-of-memory error. It also avoids excessively conservative RAM pre-allocations which saves cost for customers.
Gelato: Feedback-driven and Guided Security Analysis of Client-side Web Applications
Modern web applications are getting more sophisticated by using frameworks that make development easy, but pose challenges for security analysis tools. New analysis techniques are needed to handle such frameworks that grow in number and popularity. In this paper, we describe Gelato that addresses the most crucial challenges for a security-aware client-side analysis of highly dynamic web applications. In particular, we use a feedback-driven and state-aware crawler that is able to analyze complex framework-based applications automatically, and is guided to maximize coverage of security-sensitive parts of the program. Moreover, we propose a new lightweight client-side taint analysis that outperforms the state-of-the-art tools, requires no modification to browsers, and reports non-trivial taint flows on modern JavaScript applications. Gelato reports vulnerabilities with higher accuracy than existing tools and achieves significantly better coverage on 12 applications of which three are used in production.
Private Federated Learning with Domain Adaptation
Federated learning (FL) was originally motivated by communication bottlenecks in training models from data stored across millions of devices, but the paradigm of distributed training is attractive for models built on sensitive data, even when the number of users is relatively small, such as collaborations between organizations. For example, when training machine learning models from health records, the raw data may be limited in size, too sensitive to be aggregated directly, and concerns about data reconstruction must be addressed. Differential privacy (DP) offers a guarantee about the difficulty of reconstructing individual data points, but achieving reasonable privacy guarantees on small datasets can significantly degrade model accuracy. Data heterogeneity across users may also be more pronounced with smaller numbers of users in the federation pool. We provide a theoretical argument that model personalization offers a practical way to address both of these issues, and demonstrate its effectiveness with experimental results on a variety of domains, including spam detection, named entity recognition on case narratives from the Vaccine Adverse Event Reporting System (VAERS) and image classification using the federated MNIST dataset (FEMNIST).
Industrial Experience of Finding Cryptographic Vulnerabilities in Large-scale Codebases
Enterprise environment often screens large-scale (millions of lines of code) codebases with static analysis tools to find bugs and vulnerabilities. Parfait is a static code analysis tool used in Oracle to find security vulnerabilities in industrial codebases. Recently, many studies show that there are complicated cryptographic vulnerabilities caused by misusing cryptographic APIs in JavaTM1 . In this paper, we describe how we realize a precise and scalable detection of these complicated cryptographic vulnerabilities based on Parfait framework. The key challenge in the detection of cryptographic vulnerabilities is the high false alarm rate caused by pseudo-influences. Pseudo-influences happen if security-irrelevant constants are used in constructing security-critical values. Static analysis is usually unable to distinguish them from hard-coded constants that expose sensitive information. We tackle this problem by specializing the backward dataflow analysis used in Parfait with refinement insights, an idea from the tool CryptoGuard [20]. We evaluate our analyzer on a comprehensive Java cryptographic vulnerability benchmark and eleven large real-world applications. The results show that the Parfait-based cryptographic vulnerability detector can find real-world cryptographic vulnerabilities in large-scale codebases with high true-positive rates and low runtime cost.
I have data and a business problem; now what?
In the last few decades, machine learning has made many great leaps and bounds, thereby substantially improving the state of the art in a diverse range of industry applications. However, for a given dataset and a business use case, non-technical users are faced by many questions that limit the adoption of a machine learning solution. For example: • Which machine learning model should I use? • How should I set its hyper-parameters? • Can I trust what my model learned? • Does my model discriminate against a marginalized, protected group? Even for seasoned data scientists, answering these questions can be tedious and time consuming. To address these barriers, the AutoMLx team at Oracle Labs has developed an automated machine learning (AutoML) pipeline that performs automated feature engineering, preprocessing and selection, and then selects a suitable machine learning model and hyper-parameter configuration. To help users understand and trust their "magic" and opaque machine learning models, the AutoMLx package supports a variety of methods that can help explain what the model has learned. In this talk, we will provide an overview of our current AutoMLx methods; we will comment on open questions and our active areas of research; and we will briefly review the projects of our sister teams at Oracle Labs. Finally, in this talk we will briefly reflect on some of the key differences between research in a cutting-edge industry lab compared with research in an academic setting.
Online Selection with Cumulative Fairness Constraints
We propose and study the problem of online selection with cumulative fairness constraints. In this problem, candidates arrive online, i.e., one at a time, and the decision maker must choose to accept or reject each candidate subject to a constraint on the history of decisions made thus far. We introduce deterministic, randomized, and learned policies for selection in this setting. Empirically, we demonstrate that our learned policies achieve the highest utility. However, we also show—using 700 synthetically generated datasets— that the simple, greedy algorithm is often competitive with the optimal sequence of decisions, obviating the need for complex (and often inscrutable) learned policies, in many cases. Theoretically, we analyze the limiting behavior of our randomized approach and prove that it satisfies the fairness constraint with high probability.
Scalable Static Analysis to Detect Security Vulnerabilities: Challenges and Solutions
Parfait is a static analysis tool originally developed to find defects in C/C++ systems code. It has since been extended to detect injection attacks in Java and PL/SQL applications. Parfait has been deployed internally at Oracle, is used by thousands of developers, and can be integrated at commit-time, in the nightly build or used standalone. Commit-time integration brings security closer to developers, and provides them with the opportunity to fix defects before they are merged. This poster presents some of the challenges we encountered in the process of extending Parfait from a defect analyser for C/C++ to a security analyser for Java and PL/SQL, and the solutions that enabled us to analyse a variety of commercial enterprise applications in a fast and precise way.
Poster: Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation
The popularity of the PDF format and the rich JavaScript environment that PDF viewers offer make PDF documents an attractive attack vector for malware developers. Because machine learning-based approaches are subject to adversarial attacks that mimic the structure of benign documents, we propose to detect malicious code inside a PDF by statically reasoning about its possible behaviours using abstract interpretation. A comparison with state-of-the-art PDF malware detection tools shows that our conservative abstract interpretation approach achieves similar accuracy, is more resilient to evasion attacks, and provides explainable reports.
Clonefiles
Explores the concept of clonefiles (aka Linux reflinks) and describes various tools and techniques for efficient introspection and processing.
Montsalvat: Intel SGX Shielding for GraalVM Native Images
The rapid growth of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are susceptible to various forms of privileged attacks. The emergence of trusted execution environments (TEEs), i.e., Intel SGX, mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel or hypervisors. To efficiently use TEEs, developers are required to manually partition their applications into trusted and untrusted parts. This decreases the trusted computing base (TCB) and minimizes security vulnerabilities. However, partitioning Java applications poses two important challenges: (1) ensuring efficient object communication between the partitioned components, and (2) ensuring garbage collection consistency between them. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications using secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM Native Image, a tool which ahead-of-time compiles Java applications into standalone native executables which do not require a JVM at runtime. We perform extensive evaluations of Montsalvat using micro and macro benchmarks, and show that our partitioning approach can lead to up to 6.6× and 2.9× performance boosts in real-world applications (i.e., PalDB and GraphChi) respectively as compared to solutions that naively include the entire applications in the enclave.
Distributed Graph Processing with PGX.D
Graph processing is one of the top data analytics trends. In particular, graph processing comprises two main styles of analysis, namely graph algorithms and graph pattern-matching queries. Classic graph algorithms, such as Pagerank, repeatedly traverse the vertices and edges of the graph and calculate some desired (mathematical) function. Graph queries enable the interactive exploration and pattern matching of graphs. For example, queries like `SELECT p1.name, p2.name FROM MATCH (p1:person)-[:friend]->(p2:person) WHERE p1.country = p2.country` combine the classic operations found in SQL with graph patterns. Both algorithms and queries are very challenging workloads, especially in a distributed setting, where very large graphs are partitioned across multiple machines. In this lecture, I will present how the distributed PGX [1] engine (known as PGX.D; developed at Oracle Labs [2] Zurich) implements efficient algorithms and queries and solves problems, such as data skew and intermediate-result explosion. In brief, for graph algorithms, PGX.D offers the functionality to compile simple sequential textbook-style GreenMarl [3] algorithms to efficient distributed execution. For queries, PGX.D includes a depth-first asynchronous computation runtime [4] that enables limiting the amount of intermediate data during query execution to essentially support "any-size" patterns. [1] http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html [2] https://labs.oracle.com [3] Green-Marl: A DSL for easy and efficient graph analysis, ASPLOS'12. [4] aDFS: An Almost Depth-First-Search Distributed Graph-Querying System. USENIX ATC'21.
Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation
Sequence-to-Sequence (Seq2Seq) neural text generation models, especially the pre-trained ones (e.g., BART and T5), have exhibited compelling performance on various natural language generation tasks. However, the black-box nature of these models limits their application in tasks where specific rules (e.g., controllable constraints, prior knowledge) need to be executed. Previous works either design specific model structures (e.g., Copy Mechanism corresponding to the rule “the generated output should include certain words in the source input”) or implement specialized inference algorithms (e.g., Constrained Beam Search) to execute particular rules through the text generation. These methods require the careful design case-by-case and are difficult to support multiple rules concurrently. In this paper, we propose a novel module named Neural Rule-Execution Tracking Machine, i.e., NRETM, that can be equipped into various transformer-based generators to leverage multiple rules simultaneously to guide the neural generation model for superior generation performance in a unified and scalable way. Extensive experiments on several benchmarks verify the effectiveness of our proposed model in both controllable and general text generation tasks.
Security Research at Oracle Labs, Australia
This is a broad-brush overview of the relevant projects (both past and present) at Oracle Labs, Australia. It also outlines some of the security ideas and software engineering principles that are relevant to tool development and deployment.
Bitemporal Property Graphs to Organize Evolving Systems
This work is a summarized view on the results of a one- year cooperation between Oracle Corp. and the University of Leipzig. The goal was to research the organization of relationships within multi- dimensional time-series data, such as sensor data from the IoT area. We showed in this project that temporal property graphs with some exten- sions are a prime candidate for this organizational task that combines the strengths of both data models (graph and time-series). The outcome of the cooperation includes four achievements: (1) a bitemporal property graph model, (2) a temporal graph query language, (3) a conception of continuous event detection, and (4) a prototype of a bitemporal graph database that supports the model, language and event detection.
Diverse Data Augmentation via Unscrambling Text with Missing Words
We present the Diverse Augmentation using Scrambled Seq2Seq (DAugSS) algorithm, a fully automated data augmentation mechanism that leverages a model to generate examples in a semi-controllable fashion. The main component of DAugSS is a training procedure in which the generative model is trained to transform a class label and a sequence of tokens into a well-formed sentence of the specified class that contains the specified tokens. Empirically, we show that DAugSS is competitive with or outperforms state-of-the-art, generative models for data augmentation in terms of test set accuracy on 4 datasets. We show that the flexibility of our approach yields datasets with expansive vocabulary, and that models trained on these datasets are more resilient to adversarial attacks than when trained on datasets augmented by competing methods.
Searching Near and Far for Examples in Data Augmentation
In this work, we demonstrate that augmenting a dataset with examples that are far from the initial training set can lead to significant improvements in test set accuracy. We draw on the similarity of deep neural networks and nearest neighbor models. Like a nearest neighbor classifier, we show that, for any test example, augmentation with a single, nearby training example of the same label--followed by retraining--is often sufficient for a BERT-based model to correctly classify the test example. In light of this result, we devise FRaNN, an algorithm that attempts to cover the embedding space defined by the trained model with training examples. Empirically, we show that FRaNN, and its variant FRaNNK, construct augmented datasets that lead to models with higher test set accuracy than either uncertainty sampling or a random augmentation baseline.
Multivalent Entailment Graphs for Question Answering
Drawing inferences between open-domain natural language predicates is a necessity for true language understanding. There has been much progress in unsupervised learning of entailment graphs for this purpose. We make three contributions: (1) we reinterpret the Distributional Inclusion Hypothesis to model entailment between predicates of different valencies, like DEFEAT(Biden, Trump) |= WIN(Biden); (2) we actualize this theory by learning unsupervised Multivalent Entailment Graphs of open-domain predicates; and (3) we demonstrate the capabilities of these graphs on a novel question answering task. We show that directional entailment is more helpful for inference than non-directional similarity on questions of fine-grained semantics. We also show that drawing on evidence across valencies answers more questions than by using only the same valency evidence.
Open-Domain Contextual Link Prediction and its Complementarity with Entailment Graphs
An open-domain knowledge graph (KG) has entities as nodes and natural language relations as edges, and is constructed by extracting (subject, relation, object) triples from text. The task of open-domain link prediction is to infer missing relations in the KG. Previous work has used standard link prediction for the task. Since triples are extracted from text, we can ground them in the larger textual context in which they were originally found. However, standard link prediction methods only rely on the KG structure and ignore the textual context of the triples. In this paper, we introduce the new task of open-domain contextual link prediction which has access to both the textual context and the KG structure to perform link prediction. We build a dataset for the task and propose a model for it. Our experiments show that context is crucial in predicting missing relations. We also demonstrate the utility of contextual link prediction in discovering out-of-context entailments between relations, in the form of entailment graphs (EG), in which the nodes are the relations. The reverse holds too: out-of-context EGs assist in predicting relations in context.
GraalVM, Python, and Polyglot Programming
Presentation at HPI graduate school, the PhD school at the HPI in Potsdam.
LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)
Paper to be submitted to ACM OOPSLA 2021. Abstract: In 2014, Steele, Lea, and Flood presented {\sc SplitMix}, an object-oriented pseudorandom number generator (PRNG) that is quite fast (9 64-bit arithmetic/logical operations per 64 bits generated) and also {\it splittable}. A conventional PRNG object provides a {\it generate} method that returns one pseudorandom value and updates the state of the PRNG; a splittable PRNG object also has a second operation, {\it split}, that replaces the original PRNG object with two (seemingly) independent PRNG objects, by creating and returning a new such object and updating the state of the original object. Splittable PRNG objects make it easy to organize the use of pseudorandom numbers in multithreaded programs structured using fork-join parallelism. This overall strategy still appears to be sound, but the specific arithmetic calculation used for {\it generate} in the {\sc SplitMix} algorithm has some detectable weaknesses, and the period of any one generator is limited to $2^{64}$. Here we present the LXM \emph{family} of PRNG algorithms. The idea is an old one: combine the outputs of two independent PRNG algorithms, then (optionally) feed the result to a mixing function. An LXM algorithm uses a linear congruential subgenerator and an $\mathbf{F}_2$-linear subgenerator; the examples studied in this paper use an LCG of period $2^{16}$, $2^{32}$, $2^{64}$, or $2^{128}$ with one of the multipliers recommended by L'Ecuyer or by Steele and Vigna, and an $\mathbf{F}_2$-linear generator of the \texttt{xoshiro} family or \texttt{xoroshiro} family as described by Blackman and Vigna. Mixing functions studied in this paper include the MurmurHash3 finalizer function, David Stafford's variants, Doug Lea's variants, and the null (identity) mixing function. Like {\sc SplitMix}, LXM provides both a \emph{generate} operation and a \emph{split} operation. Also like {\sc SplitMix}, LXM requires no locking or other synchronization (other than the usual memory fence after instance initialization), and is suitable for use with {\sc simd} instruction sets because it has no branches or loops. We analyze the period and equidistribution properties of LXM generators, and present the results of thorough testing of specific members of this family, using the TestU01 and PractRand test suites, not only on single instances of the algorithm but also for collections of instances, used in parallel, ranging in size from $2$ to $2^{27}$. Single instances of LXM that include a strong mixing function appear to have no major weaknesses, and LXM is significantly more robust than {\sc SplitMix} against accidental correlation in a multithreaded setting. We believe that LXM is suitable for the same sorts of applications as {\sc SplitMix}, that is, ``everyday'' scientific and machine-learning applications (but not cryptographic applications), especially when concurrent threads or distributed processes are involved.
Run-time Data Analysis to Drive Compiler Optimizations
Throughout program execution, types may stabilize, variables may become constant, and code sections may turn out to be redundant - all information that is used by just-in-time (JIT) compilers to achieve peak performance. Yet, since JIT compilation is done on demand for individual code parts, global observations cannot be made. Moreover, global data analysis is an inherently expensive process, that collects information over large data sets. Thus, it is infeasible in dynamic compilers. With this project, we propose integrating data analysis into a dynamic runtime to speed up big data applications. The goal is to use the detailed run-time information for speculative compiler optimizations based on the shape and complexion of the data to improve performance.
Run-Time Data Analysis in Dynamic Runtimes
Databases are typically faster in processing huge amounts of data than applications with hand-coded data access. Even though modern dynamic runtimes optimize applications intensively, they cannot perform certain optimizations that are traditionally used by database systems as they lack the required information. Thus, we propose to extend the capabilities of dynamic runtimes to allow them to collect fine- grained information of the processed data at run time and use it to perform database-like optimizations. By doing so, we want to enable dynamic runtimes to significantly boost the performance of data-processing workloads. Ideally, applications should be as fast as databases in data-processing workloads by detecting the data schema at run time. To show the feasibility of our approach, we are implementing it in a polyglot dynamic runtime.
LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)
Video for a conference presentation at ACM OOPSLA 2021. The video file is 1280x720. An associated SRT file contains the subtitle (closed caption) information separately. The corresponding paper is Archivist 2021-0405. The slides are available in PDF and PowerPoint formats as Archivist 2021-1004.
GraalVM Native Image: Large-scale static analysis for Java
GraalVM Native Image combines static analysis, heap snapshotting, and ahead-of-time compilation to produce a highly optimized standalone executable for a Java application. In this talk, we first introduce the overall architecture of GraalVM Native Image: instead of “just” compiling Java bytecode ahead of time, it also initializes part of the application at build time. This reduces the startup time and memory footprint of the application at run time. In the second part of the talk, we dive into details of the points-to analysis. We show which of our original research ideas worked or did not work when analyzing large production applications; and we show the benefits of tightly integrating the static analysis with the ahead-of-time compiler.
Lightweight On-Stack Replacement in Languages with Unstructured Loops
On-stack replacement (OSR) is a popular technique used by just in time (JIT) compilers. A JIT can use OSR to transfer from interpreted to compiled code in the middle of execution, immediately reaping the performance benefits of compilation. This technique typically relies on loop counters, so it cannot be easily applied to languages with unstructured control flow. It is possible to reconstruct the high-level loop structures of an unstructured language using a control flow analysis, but such an analysis can be complicated, expensive, and language-specific. In this paper, we present a more lightweight strategy for OSR in unstructured languages which relies only on detecting backward jumps. We design a simple, language-agnostic API around this strategy for language interpreters. We then discuss our implementation of the API in the Truffle framework, and the design choices we made to make it efficient and correct. In our evaluation, we integrate the API with Truffle’s LLVM bitcode interpreter, and find the technique is effective at improving start-up performance without harming warmed-up performance.
CompGen: Generation of Fast Compilers in a Multi-Language VM
The first Futamura projection enables compilation and high performance code generation of user programs by partial evaluation of language interpreters. Previous work has shown that it is sufficient to leverage profiling information and use partial evaluation directives in interpreters as hints to drive partial evaluation towards compiled code efficiency. However, this comes with the downside of additional application warm-up time: Partial evaluation of language interpreters has to specialize interpreter code on the fly to the dynamic types used at run time to create efficient target code. As a result, the tie spend on partial evaluation itself is a significant contributor to the overall compile time of a method. The second Futamura projection solves this problem by self-applying partial evaluation on the partial evaluation algorithm, effectively generating language-specific compilers from interpreters. This typically reduces compilation time compared to the first projection. Previous work employed the second projection to some extent, however to this day, no generic second Futamura projection approach is used in a state-of-the-art language runtime. Ultimately, the problems of code-size explosion for compiler generation and warm-up time increases are unsolved problems subject to research to this day. To solve the problems of code-size explosion and self-application warm-up this paper proposes \emph{CompGen}, an approach based on code generation of subsets of language interpreters which is loosely based upon the idea of the second Futamura projection. We implemented a prototype of CompGen for \textit{GraalVM} and show that our usage of a novel code-generation algorithm, incorporating interpreter directives allows to generate efficient compilers that emit fast target programs which easily outperform the first Fumatura projection in compilation time. We evaluated our approach with \textit{GraalJS}, an ECMAScript-compliant interpreter, and standard JavaScript benchmarks, showing that our approach achieves $2-3X$ speedups of partial evaluation.
Tribuo: Machine Learning with Provenance in Java
Machine Learning models are deployed across a wide range of industries, performing a wide range of tasks. Tracking these models and ensuring they behave appropriately is be- coming increasingly difficult as the number of models increases. Current ML monitoring systems provide provenance and tracking by layering on top of the library that performs the ML computation, allowing room for developer confusion and mistakes. In this paper we introduce Tribuo, a Java ML library which integrates model training, inference, strong type-safety, runtime checking, and automatic provenance recording into a single framework. All Tribuo’s models and evaluations record the full data pipeline of training and testing data, along with the training algorithms, hyperparameters and data transformation steps automatically. This data lives inside the model object and can be persisted separately using common markup formats. Tribuo implements many popular ML algorithms for classification, regression, clustering, multi-label classification and anomaly detection, along with interfaces to XGBoost, TensorFlow and ONNX Runtime. Tribuo’s source code is available at https://github.com/oracle/tribuo under an Apache 2.0 license with documentation and tutorials available at https://tribuo.org.
Low-Overhead Multi-Language Dynamic Taint Analysis on Managed Runtimes through Speculative Optimization
Conference presentation of the paper http://ol-archivist.us.oracle.com/archivist/document/2021-0512
Searching Near and Far for Examples in Data Augmentation
In this work, we demonstrate that augmenting a dataset with examples that are far from the initial training set can lead to significant improvements in test set accuracy. We draw on the similarity of deep neural networks and nearest neighbor models. Like a nearest neighbor classifier, we show that, for any test example, augmentation with a single, nearby training example of the same label--followed by retraining--is often sufficient for a BERT-based model to correctly classify the test example. In light of this result, we devise FRaNN, an algorithm that attempts to cover the embedding space defined by the trained model with training examples. Empirically, we show that FRaNNk, and its variant FRaNNk, construct augmented datasets that lead to models with higher test set accuracy than either uncertainty sampling or a random augmentation baseline.
Private Cross-Silo Federated Learning for Extracting Vaccine Adverse Event Mentions
Federated Learning (FL) is quickly becoming a goto distributed training paradigm for users to jointly train a global model without physically sharing their data. Users can indirectly contribute to, and directly benefit from a much larger aggregate data corpus used to train the global model. However, literature on successful application of FL in real-world problem settings is somewhat sparse. In this pa- per, we describe our experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the context of mass scale vaccination programs. We present a comprehensive empirical analysis of various dimensions of benefits gained with FL based training. Furthermore, we investi- gate effects of tighter Differential Privacy (DP) constraints in highly sensitive settings where federation users must enforce Local DP to ensure strict privacy guarantees. We show that local DP can severely cripple the global model’s prediction accuracy, thus disincentivizing users from participating in the federation. In response, we demon- strate how recent innovation on personalization methods can help significantly recover the lost accuracy.
Just-in-Time Compiling Ruby Regexps on TruffleRuby
Just-in-Time Compiling Ruby Regexps on TruffleRuby, a presentation about the performance benefits gained by the adoption of TRegex in TruffleRuby.
ICDAR 2021 Scientific Literature Parsing Competition
Documents in Portable Document Format (PDF) are ubiquitous with over 2.5 trillion documents. PDF format is human readable but not easily understood by machines and the large number of different styles makes it difficult to process the large variety of documents effectively. Our ICDAR 2021 Scientific Literature Parsing Competition offers participants with a large number of training and evaluation examples compared to previous competitions. Top competition results show a significant increase in performance compared to previously reported on the competition data sets. Most of the current methods for document understanding rely on deep learning, which requires a large number of training examples. We have generated large data sets that have been used in this competition. Our competition is split into two tasks to understand document layouts (Task A) and tables (Task B). In Task A, Document Layout Recognition, submissions with the highest performance combine object detection and specialised solutions for the different categories. In Task B, Table Recognition, top submissions rely on methods to identify table components and post-processing methods to generate the table structure and content. Results from both tasks show an impressive performance and opens the possibility for high performance practical applications.
The Future Is Big Graphs: A Community View on Graph Processing Systems
Graphs are, by nature, 'unifying abstractions' that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?
Exploring Time-Space trade-offs for "synchronized" in Lilliput
In the context of project lilliput, which attempts to reduce the size of object header in the HotSpot Java Virtual Machine (JVM), we explore a curated set of synchronization algorithms. Each of the algorithms could serve as a potential replacement implementation for the “synchronized” construct in HotSpot. Collectively, the algorithms illuminate trade-offs in space-time properties. The key design decisions are where to locate synchronization metadata (monitor fields), how to map from an object to those fields, and the lifecycle of the monitor information. The readers is assumed to be familiar with current HotSpot implementation of “synchronized” as well as the the Compact Java Monitors (CJM) design.
Language-Agnostic Integrated Queries in a Managed Polyglot Runtime
Language-integrated query (LINQ) frameworks offer a convenient programming abstraction for processing in-memory collections of data, allowing developers to concisely express declarative queries using general-purpose programming languages. Existing LINQ frameworks rely on the well-defined type system of statically-typed languages such as C ♯ or Java to perform query compilation and execution. As a consequence of this design, they do not support dynamic languages such as Python, R, or JavaScript. Such languages are however very popular among data scientists, who would cer- tainly benefit from LINQ frameworks in data analytics applications. In this work we bridge the gap between dynamic languages and LINQ frameworks. We introduce DynQ, a novel query engine designed for dynamic languages. DynQ is language-agnostic, since it is able to execute SQL queries in a polyglot language runtime. Moreover, DynQ can execute queries combining data from multiple sources, namely in-memory object collections as well as on-file data and external database systems. Our evaluation of DynQ shows performance comparable with equivalent hand-optimized code, and in line with common data-processing libraries and embedded databases, making DynQ an appealing query engine for standalone analytics applications and for data-intensive server-side workloads.
Private Cross-Silo Federated Learning for Extracting Vaccine Adverse Event Mentions
Federated Learning (FL) is quickly becoming a goto distributed training paradigm for users to jointly train a global model without physically sharing their data. Users can indirectly contribute to, and directly benefit from a much larger aggregate data corpus used to train the global model. However, literature on successful application of FL in real-world problem settings is somewhat sparse. In this pa- per, we describe our experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the context of mass scale vaccination programs. We present a comprehensive empirical analysis of various dimensions of benefits gained with FL based training. Furthermore, we investi- gate effects of tighter Differential Privacy (DP) constraints in highly sensitive settings where federation users must enforce Local DP to ensure strict privacy guarantees. We show that local DP can severely cripple the global model’s prediction accuracy, thus disincentivizing users from participating in the federation. In response, we demon- strate how recent innovation on personalization methods can help significantly recover the lost accuracy. We focus our analysis on the Federated Fine-Tuning algorithm, FedFT, and prove that it is not PAC Identifiable, thus making it even more attractive for FL-based training.
Mention Flags (MF): Constraining Transformer-based Text Generators
This paper focuses on Seq2Seq (S2S) constrained text generation where the text generator is constrained to mention specific words which are inputs to the encoder in the generated outputs. Pre-trained S2S models or a Copy Mechanism are trained to copy the surface tokens from encoders to decoders, but they cannot guarantee constraint satisfaction. Constrained decoding algorithms always produce hypotheses satisfying all constraints. However, they are computationally expensive and can lower the generated text quality. In this paper, we propose Mention Flags (MF), which traces whether lexical constraints are satisfied in the generated outputs in a S2S decoder. The MF models are trained to generate tokens until all constraints are satisfied, guaranteeing high constraint satisfaction. Our experiments on the Common Sense Generation task (CommonGen) (Lin et al., 2020), End2end Restaurant Dialog task (E2ENLG) (Duˇsek et al., 2020) and Novel Object Captioning task (nocaps) (Agrawal et al., 2019) show that the MF models maintain higher constraint satisfaction and text quality than the baseline models and other constrained decoding algorithms, achieving state-of-the art performance on all three tasks. These results are achieved with a much lower run-time than constrained decoding algorithms. We also show that the MF models work well in the low-resource setting.
aDFS: An Almost Depth-First-Search Distributed Graph-Querying System
Graph processing is an invaluable tool for data analytics. In particular, pattern-matching queries enable flexible graph exploration and analysis, similar to what SQL provides for relational databases. Graph queries focus on following connections in the data; they are a challenging workload because even seemingly trivial queries can easily produce billions of intermediate results and irregular data access patterns. In this paper, we introduce aDFS: A distributed graphquerying system that can process practically any query fully in memory, while maintaining bounded runtime memory consumption. To achieve this behavior, aDFS relies on (i) almost depth-first (aDFS) graph exploration with some breadth-first characteristics for performance, and (ii) non-blocking dispatching of intermediate results to remote edges. We evaluate aDFS against state-of-the-art graph-querying (Neo4J and GraphFrames for Apache Spark), graph-mining (G-Miner, Fractal, and Peregrine), as well as dataflow joins (BiGJoin), and show that aDFS significantly outperforms prior work on a diverse selection of workloads.
aDFS: An Almost Depth-First-Search Distributed Graph-Querying System (Presentation Slides)
Presentation slides for the paper "aDFS: An Almost Depth-First-Search Distributed Graph-Querying System" accepted at USENIX ATC 2021.
Doing More with Less: Characterizing Dataset Downsampling for AutoML
Automated machine learning (AutoML) promises to democratize machine learning by automatically generating machine learning pipelines with little to no user intervention. Typically, a search procedure is used to repeatedly generate and validate candidate pipelines, maximizing a predictive performance metric, subject to a limited execution time budget. While this approach to generating candidates works well for small tabular datasets, the same procedure does not directly scale to larger tabular datasets with 100,000s of observations, often producing fewer candidate pipelines and yielding lower performance, given the same execution time budget. We carry out an extensive empirical evaluation of the impact that downsampling – reducing the number of rows in the input tabular dataset – has on the pipelines produced by a genetic-programmingbased AutoML search for classification tasks.
Retail markdown price optimization and inventory allocation under demand parameter uncertainty
This paper discusses a prescriptive analytics approach to solving a joint markdown pricing and inventory allocation optimization problem under demand parameter uncertainty. We consider a retailer capable of price differentiation among multiple customer groups with different demand parameters that are supplied from multiple warehouses or fulfillment centers at different costs. In particular, we consider a situation when the retailer has a limited amount of inventory that must be sold by a certain exit date. Since in most practical situations the demand parameters cannot be estimated exactly, we propose an approach to optimize the expected value of the profit based on the given distribution of the demand parameters and analyze the properties of the solution. We also describe a predictive demand model to estimate the distribution of the demand parameters based on the historical sales data. Since the sales data usually include multiple similar products embedded into a hierarchical structure, we suggest an approach to the demand modeling that takes advantage of the merchandise and location hierarchies.
Scalable String Analysis: An Experience Report (Presentation slides)
Presentation slides for the paper "Scalable String Analysis: An Experience Report" accepted at SOAP'21
Towards Intelligent Application Security
Over the past 20 years we have seen application security evolve from analysing application code through Static Application Security Testing (SAST) tools, to detecting vulnerabilities in running applications via Dynamic Application Security Testing (DAST) tools. The past 10 years have seen new flavours of tools to provide combinations of static and dynamic tools via Interactive Application Security Testing (IAST), examination of the components and libraries of the software called Software Composition Analysis (SCA), protection of web applications and APIs using signature-based Web Application Firewalls (WAF), and monitoring the application and blocking attacks through Runtime Application Self Protection (RASP) techniques. The past 10 years has also seen an increase in the uptake of the DevOps model that combines software development and operations to provide continuous delivery of high quality software. As security has become more important, the DevOps model has evolved to the DevSecOps model where software development, operations and security are all integrated. There has also been increasing usage of learning techniques, including machine learning, and program synthesis. Several tools have been developed that make use of machine learning to help developers make quality decisions about their code, tests, or runtime overhead their code produces. However, such techniques have not been applied to application security as yet. In this talk I discuss how to provide an automated approach to integrate security into all aspects of application development and operations, aided by learning techniques. This incorporates signals from the code operations and beyond, and automation, to provide actionable intelligence to developers, security analysts, operations staff, and autonomous systems. I will also consider how malware and threat intelligence can be incorporated into this model to support Intelligent Application Security in a rapidly evolving world.
Scalable String Analysis: An Experience Report
Static string analysis underpins many security-related analyses including detection of SQL injections and cross-site scripting. Even though string analysis received much attention, none of the known techniques are effective on large codebases. In this paper we present OLSA -- a tool for scalable static string analysis of large Java programs. OLSA analysis is based on intra-procedural string value flow graphs connected via call-graph edges. Formally, this uses a context-sensitive grammar to generate the set of possible strings. We evaluate our approach by using OLSA to detect SQL injections and unsafe use of reflection in DaCapo benchmarks and a large internal Java codebase and compare the performance of OLSA with the state-of-the-art string analyser called JSA. The results of this experimentation indicate that our approach can analyse industrial-scale codebases in a matter of hours, whereas JSA does not scale to many DaCapo programs. The set of potential strings generated by our string analysis can be used for checking the validity of the reported potential vulnerabilities.
Compiler-Assisted Object Inlining with Value Fields
Object Oriented Programming has flourished in many areas ranging from web-oriented microservices, data processing, to databases. However, while representing domain entities as objects is appealing to developers, it leads to high data fragmentation as data is loaded into applications as large collections of data objects, resulting in high memory footprint and poor locality. To minimize memory footprint and increase memory locality, embedding the payload of an object into another object (object inlining) has been considered before but existing techniques present severe limitations that prevent it from becoming a widely adopted technique. We argue that object inlining is mostly useful to optimize objects in the application data-path and that such objects have value semantics, unlocking great potential for inlining objects. We propose value fields, an abstraction which allows fields to be marked as having value semantics. We take advantage of the closed-world assumption provided by GraalVM Native Image to implement Object inlining as a compiler phase that modifies both object layouts and accesses to inlined fields. Experimental evaluation shows that using value fields in real-world frameworks such as Apache Spark, Spring Boot, and Micronaut, requires minimal to no effort at all from developers. Results show improvements in throughput of up to 3x, memory footprint reduction of up to 40% and reduced GC pause times of up to 35%
Modeling memory bandwidth patterns on NUMA machines with performance counters
Modern computers used for data analytics are often NUMA systems with multiple sockets per machine, multiple cores per socket, and multiple thread contexts per core. To get the peak performance out of these machines requires the correct number of threads to be placed in the correct positions on the machine. One particularly interesting element of the placement of memory and threads is the way it effects the movement of data around the machine, and the increased latency this can introduce to reads and writes. In this paper we describe work on modeling the bandwidth requirements of an application on a NUMA compute node based on the placement of threads. The model is constructed by sampling performance counters while the application runs with 2 carefully chosen thread placements. The results of this modeling can be used in a number of ways varying from: Performance debugging during development where the programmer can be alerted to potentially problematic memory access patterns; To systems such as Pandia which take an application and predict the performance and system load of a proposed thread count and placement; To libraries of data structures such as Parallel Collections and Smart Arrays that can abstract from the user memory placement and thread placement issues when parallelizing code.
The Flavour of Real World Vulnerability Detection and Intelligent Configuration
The Parfait static code analysis tool focuses on detecting vulnerabilities that matter in C, C++, Java and Python languages. Its focus has been on key items expected out of a commercial tool that lives in a commercial organisation, namely, precision of results (i.e., high true positive rate), scalability (i.e., being able to run quickly over millions of lines of code), incremental analysis (i.e., being able to run over deltas of the code quickly), and usability (i.e., ease of integration into standard build processes, reporting of traces to the vulnerable location, etc). Today, Parfait is used by thousands of developers at Oracle worldwide on a day-to-day basis. In this presentation we’ll sample a flavour of Parfait — we explore some real world challenges faced in the creation of a robust vulnerability detection tool, look into two examples of vulnerabilities that severely affected the Java platform in 2012/2013 and most machines since 2017, and conclude by recounting what matters to developers for integration into today’s continuous integration and continuous delivery (CI/CD) pipelines. Key to deployment of static code analysis tools is configuration of the tool itself - we present our experiences with use of machine learning to automatically configure the tool, providing users with a better out-of-the-box experience.
The Flavour of Real-World Vulnerability Detection and Intelligent Configuration
The Parfait static code analysis tool focuses on detecting vulnerabilities that matter in C, C++, Java and Python languages. Its focus has been on key items expected out of a commercial tool that lives in a commercial organisation, namely, precision of results (i.e., high true positive rate), scalability (i.e., being able to run quickly over millions of lines of code), incremental analysis (i.e., being able to run over deltas of the code quickly), and usability (i.e., ease of integration into standard build processes, reporting of traces to the vulnerable location, etc). Today, Parfait is used by thousands of developers at Oracle worldwide on a day-to-day basis. In this presentation we’ll sample a flavour of Parfait — we explore some real world challenges faced in the creation of a robust vulnerability detection tool, look into two examples of vulnerabilities that severely affected the Java platform in 2012/2013 and most machines since 2017, and conclude by recounting what matters to developers for integration into today’s continuous integration and continuous delivery (CI/CD) pipelines. Key to deployment of static code analysis tools is configuration of the tool itself - we present our experiences with use of machine learning to automatically configure the tool, providing users with a better out-of-the-box experience.
Intelligent Application Security
Over the past 20 years we have seen application security evolve from analysing application code through Static Application Security Testing tools, to detecting vulnerabilities in running applications via Dynamic Application Security Testing tools. The past 10 years have seen new flavours of tools: Software Composition Analysis, Web Application Firewalls, and Runtime Application Self Protection. The past 10 years has also seen an increase in the uptake of the DevOps model that combines software development and operations. Several tools have been developed that make use of machine learning to help developers make quality decisions about their code, tests, or runtime overhead their code produces. However, little has been done to address application security. This talk focuses on a vision for Intelligent Application Security in the context of the DevSecOps model, where security is integrated into DevOps, by informing program analysis with learning techniques including program synthesis, and keeping track of a knowledge base. What is Intelligent Application Security? Intelligent Application Security aims to provide an automated approach to integrate security into all aspects of application development and operation, at scale, using learning techniques that incorporate signals from the code and beyond, to provide actionable intelligence to developers, security analysts, operations staff, and autonomous systems.
RASPunzel for deserialization in 5 min
In this talk, we show how data-driven allowlist synthesis can help prevent deserialization vulnerabilities, which often lead to remote code execution attacks. Serialization is the process of converting an in-memory object to and re-creating it from a persistent format (e.g. byte stream, JSON, XML, binary). Serialization is present in many languages like Java, Python, Ruby, and C# and it is commonly used to exchange data in distributed systems or across different languages. In many cases, however, it can be exploited by crafting serialised payloads that will trigger arbitrary code upon deserialization. The most common, and insufficient, defence against deserialization attacks are blocklists, which prevent deserialization of known malicious code. Allowlists instead restrict deserialization to known benign code, but shift the burden of creating and maintaining the list from security practitioners to developers. In this talk, we show how data-driven allowlist synthesis combined with runtime application self-protection greatly simplifies the creation and enforcement of allowlists while significantly improving security. Through a demo, we will show how a runtime application self-protection (RASP) agent enforcing a synthesized allowlist prevents real-world deserialization attacks without the need to alter or re-compile application code.
PRIVATE CROSS-SILO FEDERATED LEARNING FOR EXTRACTING VACCINE ADVERSE EVENT MENTIONS
Automatically extracting mentions of suspected drug or vaccine adverse events (potential side effects) from unstructured text is critical in the current pandemic, but small amounts of labeled training data remains silo-ed across organizations due to privacy concerns. Federated Learning (FL) is quickly becoming a goto distributed training paradigm for such users to jointly train a more accurate global model without physically sharing their data. However, literature on successful application of FL in real-world problem settings is somewhat sparse. In this pa- per, we describe our experience applying a FL based solution to the Named Entity Recognition (NER) task for an adverse event detection application in the con- text of mass scale vaccination programs. Furthermore, we show that Differential Privacy (DP), which offers stronger privacy guarantees, but severely cripples the global model’s prediction accuracy, thus dis-incentivizing users from participating in the federation. We demonstrate how recent innovation on personalization methods can help significantly recover the lost accuracy.
Automated GPU Out-of-Bound Access Detectionand Prevention in a Managed Environment
GPUs have proven extremely effective at accelerating general-purpose workloads in fields from numerical simulation to deep learning and finance. However, even code written by experienced GPU programmers often offers little robustness, limiting the GPUs’ adoption in critical applications’ acceleration. Out-of-bounds array accesses are one of the most common sources of errors and vulnerabilities on GPUs and can be hard to detect and prevent due to the architectural characteristics of GPUs.This work presents an automated technique ensuring detection and protection against out-of-bounds array accesses inside CUDA GPU kernels. We compile kernels ahead-of-time, invoke them at run time using the Graal polyglot Virtual Machine and execute them on the GPU. Our technique is transparent to the user and operates on the LLVM Intermediate Representation. It adds boundary checks for array accesses based on array size knowledge, available at run time thanks to the managed execution environment, and optimizes the resulting code to minimize the impact of our modifications.We test our technique on 16 different GPU kernels extracted from common GPU workloads and show that we can prevent out-of-bounds array accesses in arbitrary GPU kernels without any statistically significant execution time overhead.
Optimizing Inference Performance of Transformers on CPUs
Slides to be presented at the EuroMLSys'21 workshop
Vate: Runtime Adaptable Probabilistic Programming in Java
Inspired by earlier work on Augur, Vate is a probabilistic programming language for the construction of JVM based models with an Object Oriented interface. As a compiled language it is able to examine the dependency graph of the model to produce optimised code that can be dynamically targeted to different platforms.
CLAMH Introduction
The Cross-Language Microbenchmark Harness (CLAMH) provides a unique environment for running software benchmarks. It is unique in that it allows comparison across different platforms and across different languages. For example, it allows the comparison of clang, gcc, llvm, and GraalVM Sulong on the same benchmark, and can also be used to compare the Java counterparts of the same benchmark running on any JVM. CLAMH allows users to verify vendor benchmark performance claims, baseline benchmark performance in their own compute environment, compare with other compute environments, and, by so doing, identify areas where performance can be improved. CLAMH has been released Open Source in the GraalVM repository - https://github.com/graalvm/CLAMH
MSET2 Streaming Prognostics for IoT Telemetry on Oracle Roving Edge Infrastructure
Critical applications needed in real-world environments would be difficult or impossible to execute on the public cloud alone because of the massive bandwidth and latency needed to transmit and process vast amounts of data, as well as offer instant responses to the results of that analysis. Oracle's MSET2 prognostic ML algorithm, implemented on Roving Edge Clusters with NVIDIA Tesla T4 GPUs, attains unprecedented reductions in computational latencies and breakthrough throughput acceleration factors for large-scale ML streaming prognostics from dense-sensor fleets of assets in such fields as U.S. Department of Defense assets, utilities, oil & gas, commercial aviation, and prognostic cybersecurity for data center IT assets as well as DoD supervisory control and data acquisition assets and networks, and smart manufacturing.
Python auf GraalVM – eine vielfältige Welt
Presentation at enterPy conference (https://www.enterpy.de/) a German business-oriented Python conference. The slides are almost the same as those presented at OOW CodeOne 2019 (approved here: http://ol-archivist.us.oracle.com/archivist/document/2019-0905), updated for new features, URLs, performance numbers, and compatibility.
IFDS Taint Analysis With Access Paths
Over the years, static taint analysis emerged as the analysis of choice to detect some of the most common web application vulnerabilities, such as SQL injection (SQLi) and cross-site scripting (XSS). Furthermore, from an implementation perspective, the IFDS dataflow framework stood out as one of the most successful vehicles to implement static taint analysis for real-world Java applications. While existing approaches scale reasonably to medium-size applications (e.g. up to one hour analysis time for less than 100K lines of code), our experience suggests that no existing solution can scale to very large industrial code bases (e.g. more than 1M lines of code). In this paper, we present our novel IFDS-based solution to perform fast and precise static taint analysis of very large industrial Java web applications. Similar to state-of-the-art approaches to taint analysis, our IFDS-based taint analysis uses access paths to abstract objects and fields in a program. However, contrary to existing approaches, our analysis is demand-driven, which restricts the amount of code to be analyzed, and does not rely on a computationally expensive alias analysis, thereby significantly improving scalability.
Generality—or Not—in a Domain-Specific Language (A Case Study)
Slides for an invited keynote at the
GraalPython: not only Jython replacement
Rough translation of the abstract: GraalPython is an alternative Python 3 implementation based on GraalVM. GraalPython offers faster Python code execution, embeddability into Java applications similarly to Jython, and integration with other GraalVM languages and tools such as the R language, JavaScript, VSCode, etc. Unlike Jython, GraalPython can also support native Python extensions, for example, numpy or pandas. Apart from general introduction to GraalPython, we will take a look at some interesting aspects of CPython internals that become a contract to many Python developers and how alternative Python implementations can deal with that. Original abstract: GraalPython je alternativní implementace Pythonu 3 postavená na platformě GraalVM. GraalPython nabízí rychlejší běh Python kódu, možnost embedovat Python do Java aplikací podobně jako Jython a integraci s dalšími GraalVM jazyky a nástroji jako například: R, JavaScript, VS Code debugger nebo CPU profiler. Narozdíl od Jythonu si GraalPython klade za cíl podporovat i Python balíčky s nativním kódem jako například numpy nebo pandas. Kromě úvodu a představení GraalPythonu se podíváme i na některé zajímavosti z vnitřního fungování CPythonu, ze kterých se postupem času pro mnohé Python vývojáře stal kontrakt, na který spoléhají, a jak se s tím mohou alternativní implementace Pythonu vyrovnat.
Are many heaps better than one?
The recent introduction by Intel of widely available Non-Volatile RAM has reawakened interest in persistence, a hot topic of the 1980s and 90s. The most ambitious schemes of that era were not adopted; I will speculate as to why, and introduce a new approach based on multiple heaps, designed to overcome the problems. I’ll present the main features of the new persistence model, and describe a prototype implementation I’ve been working on for GraalVM Native Image. This purpose of this work-in-progress is to allow experimentation with the new model, so that the community can assess its desirability. I’ll outline the main features of the prototype and some of the remaining challenges.
Fast and Efficient Java Microservices With GraalVM @ Oracle Developer Live
Slides for Oracle Developer Live - Java Innovations conference. This talk will be focused on the benefits Native Image and recent updates
How to program machine learning in Java with the Tribuo library
Tribuo is a new open source library written in Java from Oracle Labs’ Machine Learning Research Group. The team’s goal for Tribuo is to build an ML library for the Java platform that is more in line with the needs of large software systems. Tribuo operates on objects, not primitive arrays, Tribuo’s models are self-describing and reproducible, and it provides a uniform interface over many kinds of prediction tasks.
ColdPress: An Extensible Malware Analysis Platform for Threat Intelligence
Malware analysis is still largely a manual task. This slow and inefficient approach does not scale to the exponential rise in the rate of new unique malware generated. Hence, automating the process as much as possible becomes desirable. In this paper, we present ColdPress – an extensible malware analysis platform that automates the end-to-end process of malware threat intelligence gathering integrated output modules to perform report generation of arbitrary file formats. ColdPress combines state-of-the-art tools and concepts into a modular system that aids the analyst to efficiently and effectively extract information from malware samples. It is designed as a user-friendly and extensible platform that can be easily extended with user-defined modules. We evaluated ColdPress with complex real-world malware samples (e.g., WannaCry), demonstrating its efficiency, performance and usefulness to security analysts. Our demo video is available at https://youtu.be/AwlBo1rxR1U.
Online Post-Processing in Rankings for Fair Utility Maximization
We consider the problem of utility maximization in online ranking applications while also satisfying a pre-defined fairness constraint. We consider batches of items which arrive over time, already ranked using an existing ranking model. We propose online post-processing for re-ranking these batches to enforce adherence to the pre-defined fairness constraint, while maximizing a specific notion of utility. To achieve this goal, we propose two deterministic re-ranking policies. In addition, we learn a re-ranking policy based on a novel variation of learning to search. Extensive experiments on real world and synthetic datasets demonstrate the effectiveness of our proposed policies both in terms of adherence to the fairness constraint and utility maximization. Furthermore, our analysis shows that the performance of the proposed policies depends on the original data distribution w.r.t the fairness constraint and the notion of utility.
Formal Verification of Authenticated, Append-Only Skip Lists in Agda: Extended Version
Authenticated Append-Only Skiplists (AAOSLs) enable maintenance and querying of an authenticated log (such as a blockchain) without requiring any single party to store or verify the entire log, or to trust another party regarding its contents. AAOSLs can help to enable efficient dynamic participation (e.g., in consensus) and reduce storage overhead. In this paper, we formalize an AAOSL originally described by Maniatis and Baker, and prove its key correctness properties. Our model and proofs are machine checked in Agda. Our proofs apply to a generalization of the original construction and provide confidence that instances of this generalization can be used in practice. Our formalization effort has also yielded some simplifications and optimizations.
CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure
The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph-processing system lies a concurrent graph data structure storing the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph-processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytic workloads and low-memory graph mutations. Existing graph structures offer a hard trade-off between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these trade-offs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, and LLAMA, a state-of-the-art update-friendly graph structure. In our evaluation, which is based on popular graph-processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2x) with frequent updates.
GraalVM: introduction and experiences from the implementation of R and Python
GraalVM is multilingual VM built on top of JVM and developed by Oracle Labs. Part of the development team is located in Prague. The talk will introduce GraalVM and its various components: JIT compiler, AOT compilation, Truffle language implementation framework. We will also discuss the realities of the development of alternative implementations of existing and well established programming languages such as JavaScript, R, and Python.
A Latina in Tech
Having started my Computer Science degree while growing up in Colombia and later completing it in Australia, I went from being an overrepresented Latina to being an underrepresented one. Further, the female to male ratio in CS in both countries was also rather different.
Being a mum, a wife, a teacher, a researcher, a manager and a leader, in this talk, I provide some of my lessons learnt throughout my career, with examples of successes and failures throughout my PhD, academic life, and industrial research life.
The University of Queensland and Oracle team up to develop world-class cyber security experts
The field of cyber security is coming of age, with more than a million job openings globally, including many in Australia, and a strong move from reactive to preventative security taking form. At The University of Queensland, teaming up with industry specialists like Oracle Labs – the research and development branch of global technology firm Oracle – will ensure both industry and researchers can focus on the real issues that businesses and users care about.
Private Federated Learning with Domain Adaptation
In a federated learning (FL) system, users can collaborate to build a shared model without explicitly sharing data, but model accuracy degrades if differential privacy guarantees are required during training. We hypothesize that domain adaptation techniques can effectively address this problem while increasing per-user prediction accuracy, especially when user data comes from disparate distributions. We present and analyze a mixture of experts (MoE) based domain adaptation approach that allows effective collaboration between users in a differentially private FL setting. Each user contributes to (and benefits from) a general, shared model to perform a common task, while maintaining a private model to adjust their predictions to their particular domain. Using both synthetic and real-world datasets, we empirically demonstrate that these private models can increase accuracy, while protecting against the release of users’ private data.
Example-based Live Programming for Everyone: Building Language-agnostic Tools for Live Programming With LSP and GraalVM
Our community has explored various approaches to improve the programming experience. Although many of them, such as Example-Based Live Programming (ELP), have shown to be effective, they are still not widespread in conventional programming environments. A reason for that is the effort required to provide sophisticated tools that rely on run-time information. To target multiple language ecosystems, it is often necessary to implement the same concepts, but for different languages and runtimes. Two emerging technologies present an opportunity to reduce this effort significantly: the Language Server Protocol (LSP) and language implementation frameworks such as GraalVM's Truffle. In this paper, we show how an ELP system can be built in a language-agnostic way by leveraging these two technologies. Based on our approach, we implemented the Babylonian Programming system, an ELP system that has previously only been implemented for exploratory ecosystems. Our system, on the other hand, brings ELP for all languages supported by the GraalVM to Visual Studio Code (VS Code). Moreover, we outline what a language-agnostic infrastructure needs to provide and how the LSP could be extended to support ELP also independently from programming environments. Further, we demonstrate how our approach enables the use of ELP in the context of polyglot programming. We illustrate the consequences of our approach by discussing its advantages and limitations and by comparing the features of our system to other ELP systems. Moreover, we give an outlook of how tools that rely on run-time information could be built in the future. This in turn might motivate future tool builders and researchers to consider implementing more tools in a language-agnostic way from the start to make them available to a broader audience.
Women in CS panel
While women were among the first programmers in the 20th century, and contributed substantially to the industry, over the years both the CS industry and CS academia got dominated by men. In this social hour, we explore the opportunities and challenges women encounter in Computer Science through a panel discussion. Our panelists are women who have leading roles in industry, academia, and industrial research. By sharing stories via Q&A, we look forward to inspiring younger women to fulfill their highest potentials, understand how women can make it to senior positions, and enjoy their career.
UnQuantize: Overcoming Signal Quantization Effects in IoT Time-Series Databases
Low-resolution quantized time-series signals present a challenge to big-data Machine Learning (ML) prognostics in IoT industrial and transportation applications. The challenge for detecting anomalies in monitored sensor signals is compounded by the fact that many industries today use 8-bit sample-and-hold analog-to-digital (A/D) converters for almost all physical transducers throughout the system. This results in the signal values being severely quantized, which adversely affects the predictive power of prognostic algorithms and can elevate empirical false-alarm and missed-alarm probabilities. Quantized signals are dense and indecipherable to the human eye and ML algorithms are challenged to detect the onset of degradation in monitored assets due to the loss of information in the digitization process. This paper presents an autonomous ML framework that detects and classifies quantized signals before instantiates two separate techniques (depending on the levels of quantization) to efficiently unquantize digitized signals, returning high-resolution signals possessing the same accuracy as signals sampled with higher bit A/D chips. This new “UnQuantize” framework works in line with streaming sensor signals, upstream from the core ML anomaly detection algorithm, yielding substantially higher anomaly-detection sensitivity, with much lower false-alarm and missed-alarm probabilities (FAPs/MAPs).
Scalable, Near-Zero Loss Disaster Recovery for Distributed Data Stores
This paper presents a new Disaster Recovery (DR) system, called Slogger, that differs from prior works in two principle ways: (i) Slogger enables DR for a linearizable distributed data store, and (ii) Slogger adopts the continuous backup approach that strives to maintain a tiny lag on the backup site relative to the primary site, thereby restricting the data loss window, due to disasters, to mil- liseconds. These goals pose a significant set of challenges related to consistency of the backup site’s state, failures, and scalability. Slogger employs a combination of asynchronous log replication, intra-data center synchronized clocks, pipelining, batching, and a novel watermark service to address these challenges. Furthermore, Slogger is designed to be deployable as an “add-on” module in an existing distributed data store with few modifications to the origi- nal code base. Our evaluation, conducted on Slogger extensions to a 32-sharded version of LogCabin, an open source key-value store, shows that Slogger maintains a very small data loss window of 14.2 milliseconds which is near the optimal value in our evalua- tion setup. Moreover, Slogger reduces the length of the data loss window by 50% compared to incremental snapshotting technique without having any performance penalty on the primary data store. Furthermore, our experiments demonstrate that Slogger achieves our other goals of scalability, fault tolerance, and efficient failover to the backup data store when a disaster is declared at the primary data store.
Leveraging Extracted Model Adversaries for Improved Black Box Attacks
We present a method for adversarial input generation against black box models for reading comprehension based question answering. Our approach is composed of two steps. First, we approximate a victim black box model via model extraction. Second, we use our own white box method to generate input perturbations that cause the approximate model to fail. These perturbed inputs are used against the victim. In experiments we find that our method improves on the efficacy of the AddAny---a white box attack---performed on the approximate model by 25% F1, and the AddSent attack---a black box attack---by 11% F1.
Simplifying GPU Access: A Polyglot Binding for GPUs with GraalVM
GPU computing accelerates workloads and fuels breakthroughs across industries. There are many GPU-accelerated libraries developers can leverage, but integrating these libraries into existing software stacks can be challenging. Programming GPUs typically requires low-level programming, while high-level scripting languages have become very popular. Accelerated computing solutions are heterogeneous and inherently more complex. We'll present an open-source prototype called grCUDA that leverages Oracle’s GraalVM and exposes GPUs in polyglot environments. While GraalVM can be regarded as the "one VM to rule them all," grCUDA is the "one GPU binding to rule them all." Data is efficiently shared between GPUs and GraalVM languages (R, Python, JavaScript) while GPU kernels can be launched directly from those languages. Precompiled GPU kernels can be used, as well as kernels that are generated at runtime. We'll also show how to access GPU-accelerated libraries such as RAPIDS cuML.
User-defined Interface Mappings for the GraalVM
To improve programming productivity, the right tools are crucial. This starts with the choice of the programming language, which often predetermines the libraries and frameworks one can use. Polyglot runtime environments, such as GraalVM, provide mechanisms for exchanging objects and sending messages across language boundaries, which allow developers to combine different languages, libraries, and frameworks with each other. However, polyglot application developers are obligated to properly use the right interfaces for accessing their data and objects from different languages.
To reduce the mental complexity for developers and let them focus on the business logic, we introduce user-defined interface mappings - an approach for adapting cross-language messages at run-time to match an expected interface. Thereby, the translation strategies are defined in an exchangeable and easy-to-edit configuration file. Thus, different stakeholders ranging from library and framework developers up to application developers can use and extend these mappings for their needs.
Toward Presizing and Pretransitioning Strategies for GraalPython
Presizing and pretransitioning are run-time optimizations that reduce reallocations of lists. These two optimizations have previously been implemented (together with pretenuring) using Mementos in the V8 Javascript engine. The design of Mementos, however, relies on the support of the garbage collector (GC) of the V8 runtime system.
In contrast to V8, dynamic language runtimes written for the GraalVM do not have access to the GC. Thus, the prior work cannot be applied directly. Instead, an alternative implementation approach without reliance on the GC is needed and poses different challenges.
In this paper we explore and analyze an approach for implementing these two optimizations in the context of GraalVM, using the Python implementation for GraalVM as an example. We substantiate these thoughts with rough performance numbers taken from our prototype on which we tested different presizing strategies.
Polyglot Code Finder
With the increasing complexity of software, it becomes even more important to build on the work of others. At the same time, websites, such as Stack Overflow or GitHub, are used by millions of developers to host their code, which could potentially be reused.
The process of finding the right code, however, is often time-consuming. In addition, the right solution may be written in a programming language that does not fit the developer's requirements. Current approaches to automate code search allow users to search for code based on keywords and transformation rules, but they are limited to one programming language.
Our approach enables developers to find code for reuse written in different languages, which is especially useful when building polyglot applications. In addition to conventional search filters, users can filter code by providing example input and expected output. Based on our approach, we have implemented a tool prototype in GraalSqueak. We evaluate both approach and prototype with an experience report.
Non-blocking interpolation search trees with doubly-logarithmic running time
Balanced search trees typically use key comparisons to guide their operations, and achieve logarithmic running time. By relying on numerical properties of the keys, interpolation search achieves lower search complexity and better performance. Although interpolation-based data structures were investigated in the past, their non-blocking concurrent variants have received very little attention so far.
In this paper, we propose the first non-blocking implementation of the classic interpolation search tree (IST) data structure. For arbitrary key distributions, the data structure ensures worst-case O(log n + p) amortized time for search, insertion and deletion traversals. When the input key distributions are smooth, lookups run in expected O(log log n + p) time, and insertion and deletion run in expected amortized O(log log n + p) time, where p is a bound on the number of threads. To improve the scalability of concurrent insertion and deletion, we propose a novel parallel rebuilding technique, which should be of independent interest.
We evaluate whether the theoretical improvements translate to practice by implementing the concurrent interpolation search tree, and benchmarking it on uniform and nonuniform key distributions, for dataset sizes in the millions to billions of keys. Relative to the state-of-the-art concurrent data structures, the concurrent interpolation search tree achieves performance improvements of up to 15% under high update rates, and of up to 50% under moderate update rates. Further, ISTs exhibit up to 2X less cache-misses, and consume 1.2 -- 2.6X less memory compared to the next best alternative on typical dataset sizes. We find that the results are surprisingly robust to distributional skew, which suggests that our data structure can be a promising alternative to classic concurrent search structures.
GraalVM Native Image Deep Dive - part I
This is the first part of the meetup, covering mostly the GraalVM ecosystem, intro to native images, frameworks support, how to get started, etc. David Leopoldseder will cover the way native images are built and compare JIT/AOT.
What is a Secure Programming Language? (POPL slides)
Our most sensitive and important software systems are written in programming languages that are inherently insecure, making the security of the systems themselves extremely challenging. It is often said that these systems were written with the best tools available at the time, so over time with newer languages will come more security. But we contend that all of today’s mainstream programming languages are insecure, including even the most recent ones that come with claims that they are designed to be “secure”. Our real criticism is the lack of a common understanding of what “secure” might mean in the context of programming language design. We propose a simple data-driven definition for a secure programming language: that it provides first-class language support to address the causes for the most common, significant vulnerabilities found in real-world software. To discover what these vulnerabilities actually are, we have analysed the National Vulnerability Database and devised a novel categorisation of the software defects reported in the database. This leads us to propose three broad categories, which account for over 50% of all reported software vulnerabilities, that as a minimum any secure language should address. While most mainstream languages address at least one of these categories, interestingly, we find that none address all three. Looking at today’s real-world software systems, we observe a paradigm shift in design and implementation towards service-oriented architectures, such as microservices. Such systems consist of many fine-grained processes, typically implemented in multiple languages, that communicate over the network using simple web-based protocols, often relying on multiple software environments such as databases. In traditional software systems, these features are the most common locations for security vulnerabilities, and so are often kept internal to the system. In microservice systems, these features are no longer internal but external, and now represent the attack surface of the software system as a whole. The need for secure programming languages is probably greater now than it has ever been.
PGX and Graal/Truffle/Active Libraries
A guest lecture in the CS4200 Compiler Construction course at Delft University of Technology (https://tudelft-cs4200-2019.github.io/) about PGX and Graal/Truffle/Active Libraries.
Computationally Easy, Spectrally Good Multipliers for Congruential Pseudorandom Number Generators
Congruential pseudorandom number generators rely on good multipliers, that is, integers that have good performance with respect to the spectral test. We provide lists of multipliers with a good lattice structure up to dimension eight for generators with typical power-of-two moduli, analyzing in detail multipliers close to the square root of the modulus, whose product can be computed quickly.
Scalable Pointer Analysis of Data Structures using Semantic Models
Pointer analysis is widely used as a base for different kinds of static analyses and compiler optimizations. Designing a scalable pointer analysis with acceptable precision for use in production compilers is still an open question. Modern object oriented languages like Java and Scala promote abstractions and code reuse, both of which make it difficult to achieve precision. Collection data structures are an example of a pervasively used component in such languages. But analyzing collection implementations with full context sensitivity leads to prohibitively long analysis times. We use semantic models to reduce the complex internal implementation of, e.g., a collection to a small and concise model. Analyzing the model with context sensitivity leads to precise results with only a modest increase in analysis time. The models must be written manually, which is feasible because a model method usually consists of only a few statements. Our implementation in GraalVM Native Image shows a rise in useful precision (1.35X rise in the number of checkcast statements that can be elided over the default analysis configuration) with a manageable performance cost (19\% rise in analysis time).
AI Decision Support Prognostics for IoT Asset Health Monitoring, Failure Prediction, Time to Failure
This paper presents a novel tandem human-machine cognition approach for human-in-the-loop control of complex business-critical and mission-critical systems and processes that are monitored by Internet-of-Things (IoT) sensor networks and where it is of utmost importance to mitigate and avoid cognitive overload situations for the human operators. We present an advanced pattern recognition system, called the Multivariate State Estimation Technique-2, which possesses functional requirements designed to minimize the possibility of cognitive overload for human operators. These functional requirements include: (1) ultralow false alarm probabilities for all monitored transducers, components, machines, subsystems, and processes; (2) fastest mathematically possible decisions regarding the incipience or onset of anomalies in noisy process metrics; and (3) the ability to unambiguously differentiate between sensor degradation events and degradation in the systems/processes under surveillance. The prognostic machine learning innovation presented herein does not replace the role of the human in operation of complex engineering systems, but augments that role in a manner that minimizes cognitive overload by very rapidly processing, interpreting, and displaying final diagnostic and prognostic information to the human operator in a prioritized format that is readily perceived and comprehended.
ContainerStress: Autonomous Cloud-Node Scoping Framework for Big-Data ML Use Cases
Deploying big-data Machine Learning (ML) services in a cloud environment presents a challenge to the cloud vendor with respect to the cloud container configuration sizing for any given customer use cases. OracleLabs has developed an automated framework that uses nested-loop Monte Carlo simulation to autonomously scale any size customer ML use cases across the range of cloud CPU-GPU “Shapes” (configurations of CPUs and/or GPUs in Cloud containers available to end customers). Moreover, the OracleLabs and NVidia authors have collaborated on a ML benchmark study which analyzes the compute cost and GPU acceleration of any ML prognostic algorithm and assesses the reduction of compute cost in a cloud container comprising conventional CPUs and NVidia GPUs.
A DSL-based framework for performance assessment
Performance assessment is an essential verification practice in both research and industry for software quality assurance. Experiment setups for performance assessment tend to be complex. A typical experiment needs to be run for a variety of involved hardware, software versions, system settings and input parameters. Typical approaches for performance assessment are based on scripts. They do not document all variants explicitly, which makes it hard to analyze and reproduce experiment results correctly. In general they tend to be monolithic which makes it hard to extend experiment setups systematically and to reuse features such as result storage and analysis consistently across experi- ments. In this paper, we present a generic approach and a DSL-based framework for performance assessment. The DSL helps the user to set and organize the variants in an experiment setup explicitly. The Runtime module in our framework executes experiments after which results are stored together with the corresponding setups in a database. Database queries provide easy access to the results of previous experiments and the correct analysis of experiment results in context of the experiment setup. Furthermore, we describe operations for common problems in performance assessment such as outlier detection. At Oracle, we successfully instantiate the framework and use it to nightly assess the performance of PGX [12, 6], a toolkit for parallel graph analytics.
Maximizing Performance with GraalVM
GraalVM project enhances the Java ecosystem with an integrated, polyglot, high-performance execution environment for dynamic, static, and native languages. GraalVM supports Java, Scala, Kotlin, Groovy, and other JVM-based languages. At the same time, it can run the dynamic scripting languages JavaScript including node.js, Ruby, R, and Python. In this workshop we will discuss the best practices for Java code and compiler configurations to maximize performance with GraalVM and how to measure performance in a reliable manner. We will talk about how to achieve minimal memory footprint and binary size using GraalVM Native Image — programs compiled ahead of time to native executables. A comparison of profile-guided optimizations for ahead-of-time compilation and just-in-time compilation will show the benefits and drawbacks of the two approaches. After this session you will have a better idea how to use GraalVM for the maximum potential to run your applications faster.
GraalVM Intro Talk
Basic overview of GraalVM features and capabilities; I'll have one more session to talk about performance/native images
GraalVM Performance Talk
Second talk for Devoxx Ukraine with focus on performance/native images; The first one will be a basic introduction. Heavily relying on the Code One talk by Thomas.
Towards Efficient, Multi-Language Dynamic Taint Analysis
Poster for SPLASH'19 Poster Show, Companion to accepted paper at MPLR'19: http://ol-archivist.us.oracle.com/archivist/document/2019-0715 Submitting again in replacement of http://ol-archivist.us.oracle.com/archivist/document/2019-1008 to be able to change document visibility from public to published to the website.
Towards Efficient, Multi-Language Dynamic Taint Analysis
Presentation for accepted paper at MPLR'19 (https://conf.researchr.org/home/mplr-2019) : http://ol-archivist.us.oracle.com/archivist/document/2019-0715 Submitting again in replacement of http://ol-archivist.us.oracle.com/archivist/document/2019-1009 to be able to change document visibility from public to published to the website.
GraalSqueak: Toward a Smalltalk-based Tooling Platform for Polyglot Programming
Polyglot programming provides software developers with a broader choice in terms of software libraries and frameworks available for building applications. Previous research and engineering activities have focused on language interoperability and the design and implementation of fast polyglot runtimes. To make polyglot programming more approachable for developers, novel software development tools are needed that help them build polyglot applications. We believe a suitable prototyping platform helps to more quickly evaluate new ideas for such tools. In this paper we present GraalSqueak, a Squeak/Smalltalk virtual machine implementation for the GraalVM. We report our experience implementing GraalSqueak, evaluate the performance of the language and the programming environment, and discuss how the system can be used as a tooling platform for polyglot programming.
Design Space Exploration of Power Delivery For Advanced Packaging Technologies
***Note: this is work the VLSI Research group did with Prof. Bakir back in 2017. The student is now graduating, and wants to finalize/publish this work.*** In this paper, a design space exploration of power delivery networks is performed for multi-chip 2.5-D and 3-D IC technologies. The focus of the paper is the effective placement of the voltage regulator modules (VRMs) for power supply noise (PSN) suppression. Multiple on-package VRM configurations have been analyzed and compared. Additionally, 3D IC chipon-VRM and backside-of-the-package VRM configurations are studied. From the PSN perspective, the 3D IC chip-on-VRM case suppresses the PSN the most even with high current density hotspots. The paper also studies the impact of different parameters such as VRM-chip distance on the package, on-chip decoupling capacitor density, etc. on the PSN.
GraalVM Slides for JAX London
These are intro-level slides to be presented at https://jaxlondon.com
Computers and Hacking: A 50-Year View
[Slides for a 20-minute keynote talk for the MIT HackMIT hackathon weekend, Saturday, September 14, 2019.] Fifty years ago, computers were expensive, institutional rather than personal, and hard to get access to. Today computers are relatively inexpensive, personal well as institutional, and ubiquitous. Some of the best hacking—and engineering—today involves creative use of relatively limited (and therefore cost-effective) computer resources.
Vandal: A scalable security analysis framework for smart contracts
The rise of modern blockchains has facilitated the emergence of smart contracts: autonomous programs that live and run on the blockchain. Smart contracts have seen a rapid climb to prominence, with applications predicted in law, business, commerce, and governance. Smart contracts are commonly written in a high-level language such as Ethereum's Solidity, and translated to compact low-level bytecode for deployment on the blockchain. Once deployed, the bytecode is autonomously executed, usually by a% Turing-complete virtual machine. As with all programs, smart contracts can be highly vulnerable to malicious attacks due to deficient programming methodologies, languages, and toolchains, including buggy compilers. At the same time, smart contracts are also high-value targets, often commanding large amounts of cryptocurrency. Hence, developers and auditors need security frameworks capable of analysing low-level bytecode to detect potential security vulnerabilities.
Maximizing Performance with GraalVM
GraalVM project enhances the Java ecosystem with an integrated, polyglot, high-performance execution environment for dynamic, static, and native languages. GraalVM supports Java, Scala, Kotlin, Groovy, and other JVM-based languages. At the same time, it can run the dynamic scripting languages JavaScript including node.js, Ruby, R, and Python. In this session we will discuss the best practices for Java code and compiler configurations to maximize performance with GraalVM and how to measure performance in a reliable manner. We will talk about how to achieve minimal memory footprint and binary size of GraalVM native images — programs compiled ahead of time to native executables. A comparison of profile-guided optimizations for ahead-of-time compilation and just-in-time compilation will show the benefits and drawbacks of the two approaches. After this session you will have a better idea how to use GraalVM for the maximum potential to run your applications faster!
Renaissance: Benchmarking Suite for Parallel Applications on the JVM
Established benchmark suites for the Java Virtual Machine (JVM), such as DaCapo, ScalaBench, and SPECjvm2008, lack workloads that take advantage of the parallel programming abstractions and concurrency primitives offered by the JVM and the Java Class Library. However, such workloads are fundamental for understanding the way in which modern applications and data-processing frameworks use the JVM's concurrency features, and for validating new just-in-time (JIT) compiler optimizations that enable more efficient execution of such workloads. We present Renaissance, a new benchmark suite composed of modern, real-world, concurrent, and object-oriented workloads that exercise various concurrency primitives of the JVM. We show that the use of concurrency primitives in these workloads reveals optimization opportunities that were not visible with the existing workloads. We use Renaissance to compare performance of two state-of-the-art, production-quality JIT compilers (HotSpot C2 and Graal), and show that the performance differences are more significant than on existing suites such as DaCapo and SPECjvm2008. We also use Renaissance to expose four new compiler optimizations, and we analyze the behavior of several existing ones. We use Renaissance to compare performance of two state-of-the-art, production-quality JIT compilers (HotSpot C2 and Graal), and show that the performance differences are more significant than on existing suites such as DaCapo and SPECjvm2008. We also use Renaissance to expose four new compiler optimizations, and we analyze the behavior of several existing ones.
Commit-time incremental analysis
Most changes to large systems that have been deployed are quite small compared to the size of the entire system. While standard summary-based analyses reduce the code that is reanalysed, they, nevertheless, analyse code that is not changed. For example, a backward summary-based analysis, will examine all the callers of the changed code even if the callers themselves have not changed. In this paper we present a novel approach of having summaries of the callers (called forward summaries) that enables one to analyse only the changed code. An evaluation of this approach on two representative examples, demonstrates that the overheads associated with the generation of the forward summaries is recovered by performing just one or two incremental analyses. Thus this technique can be used at commit-time where only the changed code is available.
What is a Secure Programming Language? (lecture + tutorial)
Lecture and tutorial using GraalVM and Simple Language.
What is a Secure Programming Language?
Our most sensitive and important software systems are written in programming languages that are inherently insecure, making the security of the systems themselves extremely challenging. It is often said that these systems were written with the best tools available at the time, so over time with newer languages will come more security. But we contend that all of today's mainstream programming languages are insecure, including even the most recent ones that come with claims that they are designed to be "secure". Our real criticism is the lack of a common understanding of what "secure" might mean in the context of programming language design. We propose a simple data-driven definition for a secure programming language: that it provides first-class language support to address the causes for the most common, significant vulnerabilities found in real-world software. To discover what these vulnerabilities actually are, we have analysed the National Vulnerability Database and devised a novel categorisation of the software defects reported in the database. This leads us to propose three broad categories, which account for over 50% of all reported software vulnerabilities, that as a minimum any secure language should address. While most mainstream languages address at least one of these categories, interestingly, we find that none address all three. Looking at today's real-world software systems, we observe a paradigm shift in design and implementation towards service-oriented architectures, such as microservices. Such systems consist of many fine-grained processes, typically implemented in multiple languages, that communicate over the network using simple web-based protocols, often relying on multiple software environments such as databases. In traditional software systems, these features are the most common locations for security vulnerabilities, and so are often kept internal to the system. In microservice systems, these features are no longer internal but external, and now represent the attack surface of the software system as a whole. The need for secure programming languages is probably greater now than it has ever been.
What is a Secure Programming Language?
Our most sensitive and important software systems are written in programming languages that are inherently insecure, making the security of the systems themselves extremely challenging. It is often said that these systems were written with the best tools available at the time, so over time with newer languages will come more security. But we contend that all of today's mainstream programming languages are insecure, including even the most recent ones that come with claims that they are designed to be ``secure''. Our real criticism is the lack of a common understanding of what ``secure'' might mean in the context of programming language design. We propose a simple data-driven definition for a secure programming language: that it provides first-class language support to address the causes for the most common, significant vulnerabilities found in real-world software. To discover what these vulnerabilities actually are, we have analysed the National Vulnerability Database and devised a novel categorisation of the software defects reported in the database. This leads us to propose three broad categories, which account for over 50\% of all reported software vulnerabilities, that \emph{as a minimum} any secure language should address. While most mainstream languages address at least one of these categories, interestingly, we find that none address all three. Looking at today's real-world software systems, we observe a paradigm shift in design and implementation towards service-oriented architectures, such as microservices. Such systems consist of many fine-grained processes, typically implemented in multiple languages, that communicate over the network using simple web-based protocols, often relying on multiple software environments such as databases. In traditional software systems, these features are the most common locations for security vulnerabilities, and so are often kept internal to the system. In microservice systems, these features are no longer internal but external, and now represent the attack surface of the software system as a whole. The need for secure programming languages is probably greater now than it has ever been.
Non-Volatile Memory and Java: Part 3
A series of short articles about the impact of non-volatile memory (NVM) on the Java platform. In the first two articles I described the main hardware and software characteristics of Intel’s new Optane persistent memory. In this article I will discuss the implications of these characteristics on how we build software.
Non-volatile memory and Java, part 2: the view from software
In the first article I described the main hardware characteristics of Intel’s new Optane persistent memory. In this article I will discuss several software issues.
Non-volatile memory and Java: 1. Introducing NVM
Non-volatile RAM (NVRAM) has arrived into the computing mainstream. This development is likely to be highly disruptive: it will change the economics of the memory hierarchy by providing a new, intermediate level between DRAM and flash, but fully exploiting the new technology will require widespread changes in how we architect and write software. Despite this, there is surprisingly little awareness on the part of programmers (and their management) of the technology and its likely impact, and relatively little activity in academia (compared to the magnitude of the paradigm shift) in developing techniques and tools which programmers will need to respond to the change. In this series I will discuss the possible impact of NVRAM on the Java ecosystem. Java is the most widely used programming language: there are millions of Java developers and billions of lines of Java code in daily use.
PolyJuS: A Squeak/Smalltalk-based Polyglot Notebook System for the GraalVM
Jupyter notebooks are used by data scientists to publish their research in an executable format. These notebooks are usually limited to a single programming language. Current polyglot notebooks extend this concept by allowing multiple languages per notebook, but this comes at the cost of having to externalize and to import data across languages. Our approach for polyglot notebooks is able to provide a more direct programming experience by executing notebooks on top of a polyglot execution environment, allowing each code cell to directly access foreign data structures and to call foreign functions and methods. We implemented this approach using GraalSqueak, a Squeak/Smalltalk implementation for the GraalVM. To prototype the programming experience and experiment with further polyglot tool support, we build a Squeak/Smalltalk-based notebook UI that is compatible with the Jupyter notebook file format. We evaluate PolyJuS by demonstrating an example polyglot notebook and discuss advantages and limitations of our approach.
An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers
Inlining is one of the most important compiler optimizations. It reduces call overheads and widens the scope of other optimizations. But, inlining is somewhat of a black art of an optimizing compiler, and was characterized as a computationally intractable problem. Intricate heuristics, tuned during countless hours of compiler engineering, are often at the core of an inliner implementation. And despite decades of research, well established inlining heuristics are still missing. In this paper, we describe a novel inlining algorithm for JIT compilers that incrementally explores a program's call graph, and alternates between inlining and optimizations. We devise three novel heuristics that guide our inliner: adaptive decision thresholds, callsite clustering, and deep inlining trials. We implement the algorithm inside Graal, a dynamic JIT compiler for the HotSpot JVM. We evaluate our algorithm on a set of industry-standard benchmarks, including Java DaCapo, Scalabench, Spark-Perf, STMBench7 and other benchmarks, and we conclude that it significantly improves performance, surpassing state-of-the-art inlining approaches with speedups ranging from 5% up to 3×.
Private Federated Learning with Domain Adaptation.
Federated Learning (FL) is a distributed machine learning (ML) paradigm that enables multiple parties to jointly re-train a shared model without sharing their data with any other parties, offering advantages in both scale and privacy. We propose a framework to augment this collaborative model-building with per-user domain adaptation. We show that this technique improves model accuracy for all users, using both real and synthetic data, and that this improvement is much more pronounced when differential privacy bounds are imposed on the FL model.
Real Time Empirical Synchronization of IoT Signals for Improved AI Prognostics
A significant challenge for Machine Learning (ML) prognostic analyses of large-scale time series databases is variable clock skew between/among multiple data acquisition (DAQ) systems across assets in a fleet of monitored assets, and even inside individual assets, where the sheer numbers of sensors being deployed are so large that multiple individual DAQs, each with their own internal clocks, can create significant clock-mismatch issues. For Big Data prognostic anomaly detection, we have discovered and amply demonstrated that variable clock skew issues in the timestamps for time series telemetry signatures cause poor performance for ML prognostics, resulting in high false-alarm and missed-alarm probabilities (FAPs and MAPs). This paper describes a new Analytical Resampling Process (ARP) that embodies novel techniques in the time domain and frequency domain for interpolative online normalization and optimal phase coherence so that all system telemetry time series outputs are available in a uniform format and aligned with a common sampling frequency. More importantly, the “optimality” of the proposed technique gives end users the ability to select between “ultimate accuracy” or “lowest overhead compute cost”, for automated coherence synchronization of collections of time series signatures, whether from a few sensors, or hundreds of thousands of sensors, and regardless of the sampling rates and signal-to-noise (S/N) ratios for those sensors.
Telemetry Parameter Synthesis System for Enhanced Tuning and Validation of Machine Learning Algorithmics
Advanced machine learning (ML) prognostics are leading to increasing Return-on-Investment (ROI) for dense-sensor Internet-of-Things (IoT) applications across multiple industries including Utilities, Oil-and-Gas, Manufacturing, Transportation, and for business-critical assets in enterprise and cloud data centers. For all of these IoT prognostic applications, a nontrivial challenge for data scientists is acquiring enough time series data from executing assets with which to evaluate, tune, optimize, and validate important prognostic functional requirements that include false-alarm and missed-alarm probabilities (FAPs, MAPs), time-to-detect (TTD) metrics for early-warning of incipient issues in monitored components and systems, and overhead compute cost (CC) for real-time stream ML prognostics. In this paper we present a new data synthesis methodology called the Telemetry Parameter Synthesis System (TPSS) that can take any limited chunk of real sensor telemetry from monitored assets, decompose the sensor signals into deterministic and stochastic components, and then generate millions of hours of high-fidelity synthesized telemetry signals that possess exactly the same serial correlation structure and statistical idiosyncrasies (resolution, variance, skewness, kurtosis, auto-correlation content, and spikiness) as the real telemetry signals from the IoT monitored critical assets. The synthesized signals bring significant value-add for ML data science researchers for evaluation and tuning of candidate ML algorithmics and for offline validation of important prognostic functional requirements including sensitivity, false alarm avoidance, and overhead compute cost. The TPSS has become an indispensable tool in Oracle’s ongoing development of innovative diagnostic/prognostic algorithms for dense-sensor predictive maintenance applications in multiple industries.
PerfIso: Performance Isolation for Commercial Latency-Sensitive Services
Large commercial latency-sensitive services, such as web search, run on dedicated clusters provisioned for peak load to ensure responsiveness and tolerate data center outages. As a result, the average load is far lower than the peak load used for provisioning, leading to resource under-utilization. The idle resources can be used to run batch jobs, completing useful work and reducing overall data center provisioning costs. However, this is challenging in practice due to the complexity and stringent tail-latency requirements of latency-sensitive services. Left unmanaged, the competition for machine resources can lead to severe response-time degradation and unmet service-level objectives (SLOs). This work describes PerfIso, a performance isolation framework which has been used for nearly three years in Microsoft Bing, a major search engine, to colocate batch jobs with production latency-sensitive services on over 90,000 servers. We discuss the design and implementation of PerfIso, and conduct an experimental evaluation in a production environment. We show that colocating CPU-intensive jobs with latency-sensitive services increases average CPU utilization from 21% to 66% for off-peak load without impacting tail latency.
An early look at the LDBC social network benchmark's business intelligence workload
In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.
Live Multi-language Development and Runtime Environments
Context: Software development tools should work and behave consistently across different programming languages, so that developers do not have to familiarize themselves with new tooling for new languages. Also, being able to combine multiple programming languages in a program increases reusability, as developers do not have to recreate software frameworks and libraries in the language they develop in and can reuse existing software instead.
Inquiry: However, developers often have a broad choice of tools, some of which are designed for only one specific programming language. Various Integrated Development Environments have support for multiple languages, but are usually unable to provide a consistent programming experience due to different language-specific runtime features. With regard to language integrations, common mechanisms usually use abstraction layers, such as the operating system or a network connection, which are often boundaries for tools and hence negatively affect the programming experience.
Approach: In this paper, we present a novel approach for tool reuse that aims to improve the experience with regard to working with multiple high-level dynamic, object-oriented programming languages. As part of this, we build a multi-language virtual execution environment and reuse Smalltalk’s live programming tools for other languages.
Knowledge: An important part of our approach is to retrofit and align runtime capabilities for different languages as it is a requirement for providing consistent tools. Furthermore, it provides convenient means to reuse and even mix software libraries and frameworks written in different languages without breaking tool support.
Grounding: The prototype system Squimera is an implementation of our approach and demonstrates that it is possible to reuse both development tools from a live programming system to improve the development experience as well as software artifacts from different languages to increase productivity.
Importance: In the domain of polyglot programming systems, most research has focused on the integration of different languages and corresponding performance optimizations. Our work, on the other hand, focuses on tooling and the overall programming experience.
A Parallel and Scalable Processor for JSON Data.
Increasing interest in JSON data has created a need for its efficient processing. Although JSON is a simple data exchange format, its querying is not always effective, especially in the case of large repositories of data. This work aims to integrate the JSONiq extension to the XQuery language specification into an existing query processor (Apache VXQuery) to enable it to query JSON data in parallel. VXQuery is built on top of Hyracks (a framework that generates parallel jobs) and Algebricks (a language-agnostic query algebra toolbox) and can process data on the fly, in contrast to other well-known systems which need to load data first. Thus, the extra cost of data loading is eliminated. In this paper, we implement three categories of rewrite rules which exploit the features of the above platforms to efficiently handle path expressions along with introducing intra-query parallelism. We evaluate our implementation using a large (803GB) dataset of sensor readings. Our results show that the proposed rewrite rules lead to efficient and scalable parallel processing of JSON data.
A Parallel and Scalable Processor for JSON Data.
Increasing interest in JSON data has created a need for its efficient processing. Although JSON is a simple data exchange format, its querying is not always effective, especially in the case of large repositories of data. This work aims to integrate the JSONiq extension to the XQuery language specification into an existing query processor (Apache VXQuery) to enable it to query JSON data in parallel. VXQuery is built on top of Hyracks (a framework that generates parallel jobs) and Algebricks (a language-agnostic query algebra toolbox) and can process data on the fly, in contrast to other well-known systems which need to load data first. Thus, the extra cost of data loading is eliminated. In this paper, we implement three categories of rewrite rules which exploit the features of the above platforms to efficiently handle path expressions along with introducing intra-query parallelism. We evaluate our implementation using a large (803GB) dataset of sensor readings. Our results show that the proposed rewrite rules lead to efficient and scalable parallel processing of JSON data.
Dominance-Based Duplication Simulation (DBDS)
Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed. This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time. We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.
Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model
In C, memory errors such as buffer overflows are among the most dangerous software errors; as we show, they are still on the rise. Current dynamic bug finding tools that try to detect such errors are based on the low-level execution model of the machine. They insert additional checks in an ad-hoc fashion, which makes them prone to forgotten checks for corner-cases. To address this issue, we devised a novel approach to find bugs during the execution of a program. At the core of this approach lies an interpreter that is written in a high-level language that performs automatic checks (such as bounds checks, NULL checks, and type checks). By mapping C data structures to data structures of the high-level language, accesses are automatically checked and bugs are found. We implemented this approach and show that our tool (called Safe Sulong) can find bugs that have been overlooked by state-of-the-art tools, such as out-of-bounds accesses to the main function arguments. Additionally, we demonstrate that the overheads are low enough to make our tool practical, both during development and in production for safety-critical software projects.
It's Time for Secure Languages (SPLASH-I slides)
Language designers and developers want better ways to write good code– languages designed with simpler, more powerful abstractions accessible to a larger community of developers. However, language design does not seem to take into account security, leaving developers with the onerous task of writing attack-proof code. In 20 years, we have gone from 25 reported vulnerabilities to 6,000+ vulnerabilities reported in a year. The top two types of vulnerabilities for the past few years have been known for over 15+ years. I’ll summarise data on vulnerabilities during 2013-2015 and argue that our languages must take security seriously. Languages need security-oriented constructs, and compilers must let developers know when there is a problem with their code. We need to empower developers with the concept of “security for the masses” by making available languages that do not necessarily require an expert in order to determine whether the code being written is vulnerable to attack or not.
It's Time for Secure Languages (slides)
Slides summarising data from the National Vulnerability Database for the past 4 years pointing at the need for better language design.
Making collection operations optimal with aggressive JIT compilation
Functional collection combinators are a neat and widely accepted data processing abstraction. However, their generic nature results in high abstraction overheads -- Scala collections are known to be notoriously slow for typical tasks. We show that proper optimizations in a JIT compiler can widely eliminate overheads imposed by these abstractions. Using the open-source Graal JIT compiler, we achieve speedups of up to 20x on collection workloads compared to the standard HotSpot C2 compiler. Consequently, a sufficiently aggressive JIT compiler allows the language compiler, such as Scalac, to focus on other concerns. In this paper, we show how optimizations, such as inlining, polymorphic inlining, and partial escape analysis, are combined in Graal to produce collections code that is optimal with respect to manually written code, or close to optimal. We argue why some of these optimizations are more effectively done by a JIT compiler. We then identify specific use-cases that most current JIT compilers do not optimize well, warranting special treatment from the language compiler.
Evaluating quality of security testing of the JDK.
In this position paper we describe how mutation testing can be used to evaluate the quality of test suites from a security viewpoint. Our focus is on measuring the quality of the test suite associated with the Java Development Kit (JDK) because it provides the core security properties for all applications. We describe the challenges associated with identifying security-specific mutation operators that are specific to the Java model and ensuring that our solution can be automated for large code-bases like the JDK.
Behavior Based Approach to Misuse Detection of a Simulated SCADA System
This paper presents the initial findings in applying a behavior-based approach for detection of unauthorized activities in a simulated Supervisory Control and Data Acquisition (SCADA) system. Misuse detection of this type utilizes fault-free system telemetry to develop empirical models that learn normal system behavior. Future monitored telemetry sources that show statistically significant deviations from this learned behavior may indicate an attack or other unwanted actions. The experimental test bed consists of a set of Linux based enterprise servers that were isolated from a larger university research cluster. All servers are connected to a private network and simulate several components and tasks seen in a typical SCADA system. Telemetry sources included kernel statistics, resource usages and internal system hardware measurements. For this study, the Auto Associative Kernel Regression (AAKR) and Auto Associative Multivariate State Estimation Technique (AAMSET) are employed to develop empirical models. Prognostic efficacy of these methods for computer security used several groups of signals taken from available telemetry classes. The Sequential Probability Ratio Test (SPRT) is used along with these models for intrusion detection purposes. The different intrusion types shown include host/network discovery, DoS, brute force login, privilege escalation and malicious exfiltration actions. For this study, all intrusion types tested displayed alterations in the residuals of much of the monitored telemetry and were able to be detected in all signal groups used by both model types. The methods presented can be extended and implemented to industries besides nuclear that use SCADA or business-critical networks.
Simulation-based Code Duplication for Enhancing Compiler Optimizations
Compiler optimizations are often limited by control flow, which prohibits optimizations across basic block boundaries. Duplicating instructions from merge blocks to their prede- cessors enlarges basic blocks and can thus enable further optimizations. However, duplicating too many instructions leads to excessive code growth. Therefore, an approach is necessary that avoids code explosion and still finds beneficial duplication candidates. We present a novel approach to determine which code should be duplicated to improve peak performance. There- fore, we analyze duplication candidates for subsequent op- timizations by simulating a duplication and analyzing its impact on the compilation unit. This allows a compiler to find those duplication candidates that have the maximum optimization potential.
SIMULATE & DUPLICATE
Poster about Simulation based Code Duplication (abstract from associated DocSymp paper) The scope of compiler optimizations is often limited by con- trol flow, which prohibits optimizations across basic block boundaries. Code duplication can solve this problem by ex- tending basic block sizes, thus enabling subsequent opti- mizations. However, duplicating code for every optimization opportunity may lead to excessive code growth. Therefore, a holistic approach is required that is capable of finding optimization opportunities and classifying their impact. This paper presents a novel approach to determine which code should be duplicated in order to improve peak perfor- mance. The approach analyzes duplication candidates for subsequent optimizations opportunities. It does so by simu- lating a duplication operation and analyzing its impact on other optimizations. This allows a compiler to weight up multiple success metrics in order to choose the code duplica- tion operations with the maximum optimization potential. We further show how to map code duplication opportunities to an optimization cost model that allows us to maximize performance while minimizing code size increase.
Detecting Malicious JavaScript in PDFs Using Conservative Abstract Interpretation
To mitigate the risk posed by JavaScript-based PDF malware, we propose a static analysis technique based on abstract interpretation. Our evaluation shows that our approach can identify 100% of malware with a low rate of false positives.
FastR update: Interoperability, Graphics, Debugging, Profiling, and other hot topics
This talk present an overview of the current progress in FastR in a number of areas that saw significant progress in the last year, e.g., Interoperability, Graphics, Debugging, Compatibility, etc.
BDgen: A Universal Big Data Generator
This paper introduces BDgen, a generator of Big Data targeting various types of users, implemented as a general and easily extensible framework. It is divided into a scalable backend designed to generate Big Data on clusters and a frontend for user-friendly definition of the structure of the required data, or its automatic inference from a sample data set. In the first release we have implemented generators of two commonly used formats (JSON and CSV) and the support for general grammars. We have also performed preliminary experimental comparisons confirming the advantages and competitiveness of the solution.
Zero-overhead R and C/C++ integration with FastR
Traditionally, C and C++ are often used to improve performance for R applications and packages. While this is usually not necessary when using FastR, because it can run R code at near-native performance, there is a large corpus of existing code that implements critical pieces of functionality in native code. Alternative implementations of R need to simulate the R native API, which is a complex API that exposes many implementation details. They spend significant effort and performance overhead to simulate the API, and there is a compilation and optimization barrier between languages. FastR can employ the Truffle framework to run native code, available as LLVM bitcode, inside the optimization scope of the polyglot environment, and thus have it integrated with no optimization and integration barriers.
Trace Register Allocation Policies: Compile-time vs. Performance Trade-offs
Register allocation has to be done by every compiler that targets a register machine, regardless of whether it aims for fast compilation or optimal code quality. State-of-the-art dynamic compilers often use global register allocation approaches such as linear scan. Recent results suggest that non-global trace-based register allocation approaches can compete with global approaches in terms of allocation quality. Instead of processing the whole compilation unit at once, a trace-based register allocator divides the problem into linear code segments, called traces. In this work, we present a register allocation framework that can exploit the additional flexibility of traces to select different allocation strategies based on the characteristics of a trace. This allows fine-grained control over the compile time vs. peak performance trade-off. Our framework features three allocation strategies, a linear-scan-based approach that achieves good code quality, a single-pass bottom-up strategy that aims for short allocation times, and an allocator for trivial traces. We present 6 allocation policies to decide which strategy to use for a given trace. The evaluation shows that this approach can reduce allocation time by 3-43% at a peak performance penalty of about 0-9% on average. For systems that do not mainly focus on peak performance, our approach allows adjusting the time spent for register allocation, and therefore the overall compilation timer, finding the optimal balance between compile time and peak performance according to an application’s requirements.
Practical partial evaluation for high-performance dynamic language runtimes
Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system. This violates the principle to not repeat yourself. In contrast, we define languages solely by writing an interpreter. The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information. Compiled code is derived automatically using partial evaluation while incorporating these specializations. This makes partial evaluation practical in the context of dynamic languages: It reduces the size of the compiled code while still compiling all parts of an operation that are relevant for a particular program. When a speculation fails, execution transfers back to the interpreter, the program re-specializes in the interpreter, and later partial evaluation again transforms the new state of the interpreter to compiled code. We evaluate our approach by comparing our implementations of JavaScript, Ruby, and R with best-in-class specialized production implementations. Our general-purpose compilation system is competitive with production systems even when they have been heavily optimized for the one language they support. For our set of benchmarks, our speedup relative to the V8 JavaScript VM is 0.83x, relative to JRuby is 3.8x, and relative to GNU R is 5x.
SOAP 2017 Presentation - An Efficient Tunable Selective Points-to Analysis for Large Codebases
Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. Although object-based context sensitivity is known to improve precision of points-to analysis, scaling it for large Java codebases remains an challenge. In this work, we develop a tunable, client-independent, object-sensitive points-to analysis framework where heap cloning is applied selectively. This approach is aimed at large codebases where standard analysis is typically expensive. Our design includes a pre-analysis that determines program points that contribute to the cost of an object-sensitive points-to analysis. A subsequent analysis then determines the context depth for each allocation site. While our framework can run standalone, it is also possible to tune it – the user of the framework can use the knowledge of the codebase being analysed to influence the selection of expensive program points as well as the process to differentiate the required context-depth. Overall, the approach determines where the cloning is beneficial and where the cloning is unlikely to be beneficial. We have implemented our approach using Souffl ́e (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs, including OpenJDK, show that our technique is efficient and precise. For the OpenJDK, our analysis reduces 27% of runtime and 18% of memory usage for a negligible loss of precision, while for Jython from the DaCapo benchmark suite, the same analysis reduces 91% of runtime for no loss of precision.
Lenient Execution of C on a JVM -- How I Learned to Stop Worrying and Execute the Code
Most C programs do not strictly conform to the C standard, and often show undefined behavior, e.g., on signed integer overflow. When compiled by non-optimizing compilers, such programs often behave as the programmer intended. However, optimizing compilers may exploit undefined semantics for more aggressive optimizations, thus possibly breaking the code. Analysis tools can help to find and fix such issues. Alternatively, one could define a C dialect in which clear semantics are defined for frequent program patterns whose behavior would otherwise be undefined. In this paper, we present such a dialect, called Lenient C, that specifies semantics for behavior that the standard left open for interpretation. Specifying additional semantics enables programmers to safely rely on otherwise undefined patterns. Lenient C aims towards being executed on a managed runtime such as the JVM. We demonstrate how we implemented the dialect in Safe Sulong, a C interpreter with a dynamic compiler that runs on the JVM.
Polyglot Native: Scala, Kotlin, and Other JVM-Based Languages with Instant Startup and low Footprint
Execution of JVM-based programs uses bytecode loading and interpretation, just-in-time compilation, and monolithic heaps. This causes JVM-based programs to startup slowly with a high memory footprint. In recent years, different projects were developed to address these issues: ahead-of-time compilation for the JVM (JEP 295) improves on JVM startup time while Scala Native and Kotlin/Native provide language-specific solutions by compiling code with LLVM and providing language-specific runtimes. We present Polyglot Native: an ahead-of-time compiler for Java bytecode combined with a low-footprint VM. With Polyglot Native, programs written in Kotlin, Scala, and other JVM-based languages have minimal startup time as they are compiled to native executables. Footprint of compiled programs is minimized by using a chunked heap and reducing necessary program metadata. In this talk, we show the architecture of Polyglot Native and compare it to existing projects. Then, we live-demo a project that compiles code from Kotlin, Scala, Java, and C into a single binary executable. Finally, we discuss intricacies of interoperability between Polyglot Native and C.
Truffle: your favorite language on JVM
Graal/Truffle is a project that aims to build multi-language, multi-tenant, multi-threaded, multi-node, multi-tooling and multi-system environment on top of JVM. Imagine that in order to develop a (dynamic) language implementation all you need is to write its interpreter in Java and immediately you get amazing peek performance, choice of several carefully tuned garbage collectors, tooling support, high speed interoperability with other languages and more. In this talk we'll take a look at how Truffle and Graal can achieve this and demonstrate the results on Ruby, JavaScript and R. Particular attention will be given to FastR the Truffle based R language implementation, its performance compared to GNU R and its support for Java interoperability including graphics.
Evaluating Quality of Security Testing of the JDK
The document outlines the main challenges in evaluating test suites that check for security properties. Specifically, it considers testing the security properties of the JDK.
UMASS Data Science Talks
I'll be giving two talks at the UMASS data science event. The first talk is on our multilingual word embedding work. The second talk is on our constrained-inference approach for sequence-to-sequence neural networks. Relevant IP is covered in two patents and both pieces of work have previously been approved for publication (patent ref numbers and archivist ids provided below).
Pandia: comprehensive contention-sensitive thread placement.
Pandia is a system for modelling the performance of in memory parallel workloads. It generates a description of a workload from a series of profiling runs, and combines this with a description of the machine's hardware to model the workload's performance over different thread counts and different placements of those threads.
The approach is “comprehensive” in that it accounts for contention at multiple resources such as processor functional units and memory channels. The points of contention for a workload can shift between resources as the degree of parallelism and thread placement changes. Pandia accounts for these changes and provides a close correspondence between predicted performance and actual performance. Testing a set of 22 benchmarks on 2 socket Intel machines fitted with chips ranging from Sandy Bridge to Haswell we see median differences of 1.05% to 0% between the fastest predicted placement and the fastest measured placement, and median errors of 8% to 4% across all placements.
Pandia can be used to optimize the performance of a given workload for instance, identifying whether or not multiple processor sockets should be used, and whether or not the workload benefits from using multiple threads per core. In addition, Pandia can be used to identify opportunities for reducing resource consumption where additional resources are not matched by additional performance for instance, limiting a workload to a small number of cores when its scaling is poor.
Better Splittable Pseudorandom Number Generators (and Almost As Fast)
We have tested and analyzed the {\sc SplitMix} pseudorandom number generator algorithm presented by Steele, Lea, and Flood \citeyear{FAST-SPLITTABLE-PRNG}, and have discovered two additional classes of gamma values that produce weak pseudorandom sequences. In this paper we present a modification to the {\sc SplitMix} algorithm that avoids all three classes of problematic gamma values, and also a completely new algorithm for splittable pseudorandom number generators, which we call {\sc TwinLinear}. Like {\sc SplitMix}, {\sc TwinLinear} provides both a \emph{generate} operation that returns one (64-bit) pseudorandom value and a \emph{split} operation that produces a new generator instance that with very high probability behaves as if statistically independent of all other instances. Also like {\sc SplitMix}, {\sc TwinLinear} requires no locking or other synchronization (other than the usual memory fence after instance initialization), and is suitable for use with {\sc simd} instruction sets because it has no branches or loops. The {\sc TwinLinear} algorithm is the result of a systematic exploration of a substantial space of nonlinear mixing functions that combine the output of two independent generators of (perhaps not very strong) pseudorandom number sequences. We discuss this design space and our strategy for exploring it. We used the PractRand test suite (which has provision for failing fast) to filter out poor candidates, then used TestU01 BigCrush to verify the quality of candidates that withstood PractRand. We present results of analysis and extensive testing on {\sc TwinLinear} (using both TestU01 and PractRand). Single instances of {\sc TwinLinear} have no known weaknesses, and {\sc TwinLinear} is significantly more robust than {\sc SplitMix} against accidental correlation in a multithreaded setting. It is slightly more costly than {\sc SplitMix} (10 or 11 64-bit arithmetic operations per 64 bits generated, rather than 9) but has a shorter critical path (5 or 6 operations rather than 8). We believe that {\sc TwinLinear} is suitable for the same sorts of applications as {\sc SplitMix}, that is, ``everyday'' scientific and machine-learning applications (but not cryptographic applications), especially when concurrent threads or distributed processes are involved.
LabelBank: Revisiting Global Perspectives for Semantic Segmentation
Semantic segmentation requires a detailed labeling of image pixels by object category. Information derived from local image patches is necessary to describe the detailed shape of individual objects. However, this information is ambiguous and can result in noisy labels. Global inference of image content can instead capture the general semantic concepts present. We advocate that holistic inference of image concepts provides valuable information for detailed pixel labeling. We propose a generic framework to leverage holistic information in the form of a LabelBank for pixel-level segmentation. We show the ability of our framework to improve semantic segmentation performance in a variety of settings. We learn models for extracting a holistic LabelBank from visual cues, attributes, and/or textual descriptions. We demonstrate improvements in semantic segmentation accuracy on standard datasets across a range of state-of-the-art segmentation architectures and holistic inference approaches.
PGX.UI: Visual Construction and Exploration of Large Property Graphs
Transforming existing data into graph formats and visualizing large graphs in a comprehensible way are two key areas of interest of information visualization. Addressing these issues requires new visualization approaches for large graphs that support users with graph construction and exploration. In addition, graph visualization is becoming more important for existing graph processing systems, which are often based on the property graph model. Therefore this paper presents concepts for visually constructing property graphs from data sources and a summary visualization for large property graphs. Furthermore, we introduce the concept of a graph construction time line that keeps track of changes and provides branching and merging, in a version control like fashion. Finally, we present a tool that visually guides users through the graph construction and exploration process.
Building Reusable, Low-Overhead Tooling Support into a High-Performance Polyglot VM
Software development tools that interact with running programs, for instance debuggers, are presumed to demand di cult tradeo s among performance, functionality, implementation complexity, and user convenience. A fundamental change in thinking obsoletes that presumption and enables the delivery of e ective tools as a forethought, no longer an afterthought.
Self-managed collections: Off-heap memory management for scalable query-dominated collections
Explosive growth in DRAM capacities and the emergence of language-integrated query enable a new class of man- aged applications that perform complex query processing on huge volumes of data stored as collections of objects in the memory space of the application. While more flexible in terms of schema design and application development, this approach typically experiences sub-par query execution per- formance when compared to specialized systems like DBMS. To address this issue, we propose self-managed collections, which utilize off-heap memory management and dynamic query compilation to improve the performance of querying managed data through language-integrated query. We eval- uate self-managed collections using both microbenchmarks and enumeration-heavy queries from the TPC-H business intelligence benchmark. Our results show that self-managed collections outperform ordinary managed collections in both query processing and memory management by up to an order of magnitude and even outperform an optimized in- memory columnar database system for the vast majority of queries.
Language-Independent Information Flow Tracking Engine for Program Comprehension Tools
Program comprehension tools are often developed for a specific programming language. Developing such a tool from scratch requires significant effort. In this paper, we report on our experience developing a language-independent framework that enables the creation of program comprehension tools, specifically tools gathering insight from deep dynamic analysis, with little effort. Our framework is language independent, because it is built on top of Truffle, an open-source platform, developed in Oracle Labs, for implementing dynamic languages in the form of AST interpreters. Our framework supports the creation of a diverse variety of program comprehension techniques, such as query, program slicing, and back-in-time debugging, because it is centered around a powerful information-flow tracking engine. Tools developed with our framework get access to the information-flow through a program execution. While it is possible to develop similarly powerful tools without our framework, for example by tracking information-flow through bytecode instrumentation, our approach leads to information that is closer to source code constructs, thus more comprehensible by the user. To demonstrate the effectiveness of our framework, we applied it to two of Truffle-based languages namely Simple Language and TruffleRuby, and we distill our experience into guidelines for developers of other Truffle-based languages who want to develop program comprehension tools for their language.
An Efficient Tunable Selective Points-to Analysis for Large Codebases
Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. Although object-based context sensitivity is known to im-prove precision of points-to analysis, scaling it for large Java codebases remains an challenge. In this work, we develop a tunable, client-independent, object-sensitive points-to analysis framework where heap cloning is applied selectively. This approach is aimed at large codebases where standard analysis is typically expensive. Our design includes a pre-analysis that determines program points that contribute to the cost of an object-sensitive points-to analysis. A subsequent analysis then determines the context depth for each allocation site. While our framework can run standalone, it is also possible to tune it – the user of the framework can use the knowledge of the codebase being analysed to influence the selection of expensive program points as well as the process to differentiate the required context-depth. Overall, the approach determines where the cloning is beneficial and where the cloning is unlikely to be beneficial. We have implemented our approach using Souffl ́e (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs, including OpenJDK, show that our technique is efficient and precise. For the OpenJDK, our analysis reduces 27% of runtime and 18% of memory usage for a negligible loss of precision, while for Jython from the DaCapo benchmark suite, the same analysis reduces 91% of runtime for no loss of precision.
Dynamic Compilation and Run-Time Optimization
Lecture Slides about "Dynamic Compilation and Run-Time Optimization" - held at University of Augsburg. Contents: - Technology invented by Self (Inline Caching, Deoptimization, ...) - Truffle and Graal Tutorial
HeadacheCoach: Towards Headache Prevention by Sensing and Making Sense of Personal Lifestyle Data
Estimates are that almost half of the world’s population has an active primary headache disorder, i.e. with no illness as an underlying cause. These can start manifesting in early adulthood and can last until the rest of the sufferer’s life. Most specialists concur that sudden changes in daily lifestyle, such are sleep rhythm, nutrition behavior or stress experience, can be valid triggers for headache sufferers. Health care professionals recommend leading a diary to self-monitor personal headache triggers in order to learn to avoid headache attacks. However, making sense out of this data is difficult. Despite existing smartphone approaches in literature that have evaluated behavior change support systems for headaches, they have failed to provide appropriate feedback on the collected daily data to showcase what causes or prevents an individual’s headache attacks. In this paper, we present HeadacheCoach, a smartphone app that tracks headache-triggering lifestyle data and headache attacks on a daily basis and propose a mixed-method approach to examine which feedback method( s) can strive the behavior change most in order to prevent future headache attacks.
FAD.js: Fast JSON Data Access Using JIT-based Speculative Optimizations
JSON is one of the most popular data encoding formats, with wide adoption in Databases and BigData frameworks, and native support in popular programming languages such as JavaScript/Node.js, Python, and R. Nevertheless, JSON data manipulation can easily become a performance bottleneck in modern language runtimes due to parsing and object materialization overheads. In this pa- per, we introduce Fad.js, a runtime system for fast manipulation of JSON objects in data-intensive applications. Fad.js is based on speculative just-in-time compilation and on direct access to raw data. Experiments show that applications using Fad.js can achieve speedups up to 2.7x for encoding and 9.9x for decoding JSON data when compared to state-of-the art JSON manipulation libraries.
Machine Learning for Finding Bugs: An Initial Report
Static program analysis is a technique to analyse code without executing it, and can be used to find bugs in source code. Many open source and commercial tools have been developed in this space over the past 20 years. Scalability and precision are of importance for the deployment of static code analysis tools - numerous false positives and slow runtime both make the tool hard to be used by development, where integration into a nightly build is the standard goal. This requires one to identify a suitable abstraction for the static analysis which is typically a manual process and can be expensive. In this paper we report our findings on using machine learning techniques to detect defects in C programs. We use three offthe- shelf machine learning techniques and use a large corpus of programs available for use in both the training and evaluation of the results. We compare the results produced by the machine learning technique against the Parfait static program analysis tool used internally at Oracle by thousands of developers. While on the surface the initial results were encouraging, further investigation suggests that the machine learning techniques we used are not suitable replacements for static program analysis tools due to low precision of the results. This could be due to a variety of reasons including not using domain knowledge such as the semantics of the programming language and lack of suitable data used in the training process.
Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate (``butterfly-patterned'') form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using CUDA 7.5 on an NVIDIA Titan Black GPU show that for double-precision data, this technique makes an entire LDA machine-learning application about 25% faster than doing a straightforward matrix transposition after using coalesced accesses.
SLIDES: It's Time for a New Old Language
Slides for an invited keynote talk at PPoPP
Increasing the Robustness of C Libraries and Applications through Run-time Introspection
In C, low-level errors such as buffer overflow and use-after-free are a major problem since they cause security vulnerabilities and hard-to-find bugs. Libraries cannot apply defensive programming techniques since objects (e.g., arrays or structs) lack run-time information such as bounds, lifetime, and types. To address this issue, we devised introspection functions that empower C programmers to access run-time information about objects and variadic function arguments. Using these functions, we implemented a more robust, source-compatible version of the C standard library that validates parameters to its functions. The library functions react to otherwise undefined behavior; for example, when detecting an invalid argument, its functions return a special value (such as -1 or NULL) and set the errno, or attempt to still compute a meaningful result. We demonstrate by examples that using introspection in the implementation of the C standard library and other libraries prevents common low-level errors, while also complementing existing approaches.
It's Time for a New Old Language
The most popular programming language in computer science has no compiler or interpreter. Its definition is not written down in any one place. It has changed a lot over the decades, and those changes have introduced ambiguities and inconsistencies. Today, dozens of variations are in use, and its complexity has reached the point where it needs to be re-explained, at least in part, every time it is used. Much effort has been spent in hand-translating between this language and other languages that do have compilers. The language is quite amenable to parallel computation, but this fact has gone unexploited. In this talk we will summarize the history of the language, highlight the variations and some of the problems that have arisen, and propose specific solutions. We suggest that it is high time that this language be given a complete formal specification, and that compilers, IDEs, and proof-checkers be created to support it, so that all the best tools and techniques of our trade may be applied to it also.
What makes TruffleRuby run Optcarrot 9 times faster than MRI?
TruffleRuby runs Optcarrot 9 times faster than MRI 2. TruffleRuby is new optimizing implementation of Ruby. Optcarrot is a NES emulator. MRI 3 targets to run Optcarrot 3 times faster than MRI 2. We will explore the techniques which allow TruffleRuby to achieve high performance in Optcarrot. We’ll discuss splitting, inlining, array strategies, Proc elimination, etc.
Dynamic Symbolic Execution for Polymorphism
Symbolic execution is an important program analysis technique that provides auxiliary execution semantics to execute programs with symbolic rather than concrete values. There has been much recent interest in symbolic execution for automatic test case generation and security vulnerability detection, resulting in various tools being deployed in academia and industry. Nevertheless, (subtype or dynamic) polymorphism of object-oriented program analysis has been neglected: existing symbolic execution techniques can explore different targets of conditional branches but not different targets of method invocations. We address the problem of how this polymorphism can be expressed in a symbolic execution framework. We propose the notion of symbolic types, which make object types symbolic. With symbolic types, various targets of a method invocation can be explored systematically by mutating the type of the receiver object of the method during automatic test case generation. To the best of our knowledge, this is the first attempt to address polymorphism in symbolic execution. Mutation of method invocation targets is critical for effectively testing object-oriented programs, especially libraries. Our experimental results show that symbolic types are significantly more effective than existing symbolic execution techniques in achieving test coverage and finding bugs and security vulnerabilities in OpenJDK.
Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
Slides for a talk to be given at ACM PPoPP on February 8, 2017. This 25-minute talk builds on the paper as accepted by PPoPP (Archivist 2016-057) and a previous version of the slides presented at NVIDIA GTC 2016 (Archivist 2016-0055). *** We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using CUDA 7.5 on an NVIDIA Titan Black GPU show that this technique makes an entire machine-learning application that uses a Latent Dirichlet Allocation topic model with 1024 topics is about 13% faster (when using single-precision floating-point data) or about 35% faster (when using double-precision floating-point data) than doing a straightforward matrix transposition after using coalesced accesses.
Towards Scalable Provenance Generation From Points-To Information: An Initial Experiment}
Points-to analysis is often used to identify potential defects in code. The usual points-to analysis does not store the justification for the presence of a specific value in the points-to relation. But for points-to analysis to meet the needs of the programmer, the analysis needs to provide the justification for its results. Programmers will use such justification to identify the cause of defect the code. In this paper we describe an approach to generate provenance informationi n the context of points-to analysis. Our solution is to define an abstract notion of data-flow traces that is computed as a post-analysis using points-to information that has already been computed. We implemented our approach in conjunction with the DOOP framework that computes points-to information. We use four benchmarks derived from two versions of the JDK, and use two realistic clients to demonstrate the effectiveness of our solution. For instance, we show that the overhead to compute these data-flow traces is only 25\% when compared to the time to compute the original points-to analysis. We also discuss some of the limitations of approach especially in generating precise traces.
Machine Learning For Finding Bugs: An Initial Report
Static program analysis is a technique to analyse code without executing it, and can be used to find bugs in source code. Many open source and commercial tools have been developed in this space over the past 20 years. Scalability and precision are of importance for the deployment of static code analysis tools - numerous false positives and slow runtime both make the tool hard to be used by development, where integration into a nightly build is the standard goal. This requires one to identify a suitable abstraction for the static analysis which is typically a manual process and can be expensive. In this paper we report our findings on using machine learning techniques to detect defects in C programs. We use three off-the-shelf machine learning techniques and use a large corpus of programs available for use in both the training and evaluation of the results. We compare the results produced by the machine learning technique against the Parfait static program analysis tool used internally at Oracle by thousands of developers. While on the surface the initial results were encouraging, further investigation suggests that the machine learning techniques we used are not suitable replacements for static program analysis tools due to low precision of the results. This could be due to a variety of reasons including not using domain knowledge such as the semantics of the programming language and lack of suitable data used in the training process.
Secure Information Flow by Access Control: A Security Type System of Dual-Access Labels
Programming languages such as Java and C# execute code with different levels of trust in the same process, and rely on a fine-grained access control model for users to manage the security requirements of program code from different sources. While such a security model is simple enough to be used in practice to protect systems from many hostile programs downloaded over a network, it does not guard against information-based attacks, such as confidentiality and integrity violations. We introduce a novel security model, called Dual-Access Label (DAL), to capture information-based security requirements of programs written in these languages. DAL labels extend the access control model by specifying both the accessibility and capability of program code, and use them to constrain information flows between code from different sources. Accessibility specifies the privileges necessary to access the code while capability indicates the privileges held by the code. DAL's security policy places a two-way obligation on both ends of information flow so that they must have sufficient capability to meet the accessibility of each other. Unlike traditional lattice-based security models, our security model offers more flexible information flow relations induced by the security policy that does not have to be transitive. It provides both confidentiality and integrity guarantees while allowing cyclic information flows among code with different security labels, as desired in many applications. We present a generic security type system to enforce possibly intransitive information flow polices, including DAL, statically at compile-time. Such security type system provides a new notion of intransitive noninterference that generalizes the standard notion of transitive noninterference in lattice-based security models.
Fast, Flexible, Polyglot Instrumentation Support for Debuggers and other Tools
Software development tools that interact with running programs, for instance debuggers, are presumed to demand difficult tradeoffs among performance, functionality, implementation complexity, and user convenience. A fundamental change in thinking obsoletes that presumption and enables the delivery of effective tools as a forethought, no longer an afterthought. We have extended the open source multi-language \platform{} with a language-agnostic Instrumentation Framework, including (1) low-level, extremely low-overhead execution event interposition, built directly into the high-performance runtime; (2) shared language-agnostic instrumentation services, requiring minimal per-language specialization; and (3) versatile APIs for constructing many kinds of client tools without modifying the VM. A new design uses this framework to implement debugging services for arbitrary languages (possibly in combination) with little effort from language implementor. We show that, when optimized, the service has no measurable overhead and generalizes to other kinds of tools. It is now possible for a client in a production environment, with thread safety, to dynamically insert into an executing program an instrumentation probe that incurs near zero performance cost until actually used to access (or modify) execution state. Other applications include tracing and stepping required by some languages, as well as platform requirements such as the need to timebox script executions. Finally, opening public API access to runtime state encourages advanced tool development and experimentation with much reduced effort.
Improving the Scalability of Automatic Linearizability Checking in SPIN
Concurrency in data structures is crucial to the performance of multithreaded programs in shared-memory multiprocessor environments. However, greater concurrency also increases the difficulty of verifying correctness of the data structure. Model checking has been used for verifying concurrent data structures satisfy the correctness condition ‘linearizability’. In particular, ‘automatic’ tools achieve verification without requiring user-specified linearization points. This has several advantages, but is generally not scalable. We examine the automatic checking used by Vechev et al. in [VYY09] to understand the scalability issues of automatic checking in SPIN. We then describe a new, more scalable automatic technique based on these insights, and present the results of a proof-of-concept implementation.
Just-In-Time GPU Compilation of Interpreted Programs with Profile-Driven Specialization
Computer systems are increasingly featuring powerful parallel devices with the advent of manycore CPUs, GPUs and FPGAs. This offers the opportunity to solve large computationally-intensive problems at a fraction of the time of traditional CPUs. However, exploiting this heterogeneous hardware requires the use of low-level programming languages such as OpenCL, which is incredibly challenging, even for advanced programmers. On the application side, interpreted dynamic languages are increasingly becoming popular in many emerging domains for their simplicity, expressiveness and flexibility. However, this creates a wide gap between the nice high-level abstractions offered to non-expert programmers and the low-level hardware-specific interface. Currently, programmers have to rely on specialized high performance libraries or are forced to write parts of their application in a low-level language like OpenCL. Ideally, programmers should be able to exploit heterogeneous hardware directly from their interpreted dynamic languages. In this paper, we present a technique to transparently and automatically offload computations from interpreted dy- namic languages to heterogeneous devices. Using just-in- time compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. We demonstrate our technique using R which is a popular interpreted dynamic lan- guage predominately used in big data analytics. Our experimental results show execution on a GPU yields speedups of over 150x when compared to the sequential FastR im- plementation and performance is competitive with manually written GPU code. We also show that when taking into ac- count startup time, large speedups are achievable, even when the applications runs for as little as a few seconds.
Defense against Cache-Based Side Channel Attacks for Secure Cloud Computing
Cloud computing is a combination of various established technologies like virtualization, dynamic elasticity,
broad band Internet, etc. to provide configurable computer resources as a service to the users. Resources are shared among
many distrusting clients by abstracting the underlying infrastructure using virtualization. While cloud computing has many
practical benefits, resource sharing in cloud computing raises a threat of Cache-Based Side Channel Attack (CSCA). In this
paper a solution is proposed to detect and prevent guest Virtual Machines (VM) from CSCA. Cache miss patterns were
analyzed in this solution to detect side channel attack. Notification channel between client and cloud service provider
(CSP) is introduced to notify CSP about the consent of client for running the prevention mechanism. Cache decay
mechanism with random decay interval is used as a prevention mechanism in the proposed solution. The performance of
the proposed solution is compared with previous solutions and the result indicates that this solution possess least
performance overhead with a constant detection rate and compatible with existing cloud computing model.
On Dynamic Information-Flow Analysis for Object-Oriented Programs
Information-flow security vulnerabilities, such as confidentiality and integrity violations, are real and serious problems found commonly in real-world software. Static analyses for information-flow control have the advantage that provides full coverage compared to dynamic analyses, as all possible security violations in the program need to be identified. On the other hand, dynamic information-flow analyses can offer distinct advantages in precision because it is less conservative than static analyses, by rejecting only insecure executions instead of whole programs, and providing additional accuracy via flow- and path-sensitivity compared to static analyses. This talk will highlight some of our attempts to detect information-based security vulnerabilities in Java programs. In particular, we will discuss our investigation on dynamic program analysis for enforcing information-flow security in object-oriented programs. Even though we are able to obtain a soundness result for the analysis by formalising a core language and a generalised operational semantics that tracks explicit and implicit information propagations at runtime, we find it is fundamentally limited and practically infeasible to develop a purely dynamic analysis for information-flow security in the presence of shared objects and aliases.
Practical Partial Evaluation for High-Performance Dynamic Language Runtimes
Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system, violating the principle to not repeat yourself. In contrast, we define languages solely by writing an interpreter. Compiled code is derived automatically using partial evaluation (the first Futamura projection). The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information. Partial evaluation incorporates these specializations. This makes partial evaluation practical in the context of dynamic languages, because it reduces the size of the compiled code while still compiling in all parts of an operation that are relevant for a particular program. Deoptimization to the interpreter, re-specialization in the interpreter, and recompilation embrace the dynamic nature of languages. We evaluate our approach comparing newly built JavaScript, Ruby, and R runtimes with current specialized production implementations of those languages. Our general purpose compilation system is competitive with production systems even when they have been heavily specialized and optimized for one language.
Intrusion Detection of a Simulated SCADA System using Data-Driven Modeling
Supervisory Control and Data Acquisition (SCADA) systems have become integrated into many industries that have a need for control and automation. Examples of these industries include energy, water, transportation, and petroleum. A typical SCADA system consists of field equipment for process actuation and control, along with proprietary communication protocols. These protocols are used to communicate between the field equipment and the monitoring equipment located at a central facility. Given that distribution of vital resources is often controlled by this type of system, there is a need to secure the networked compute and control elements from users with malicious intent. This paper investigates the use of data-driven modeling techniques to identify various types of intrusions tested against a simulated SCADA system. The test bed uses three enterprise servers that were part of a university engineering linux cluster. These were isolated so that job queries on the cluster would not be reflected in the normal behavior of the test bed, and to ensure that intrusion testing would not affect other components of the cluster. One server acts as a Master Terminal Unit (MTU), which simulates control and data acquisition processes. The other two act as Remote Terminal Units (RTUs), these simulate monitoring and telemetry transmission. All servers use Ubuntu 14.04 as the OS. A separate workstation using Kali Linux acts as a Human Machine Interface (HMI), this is used to monitor the simulation and perform intrusion testing. Monitored telemetry included network traffic, hardware and software digitized time series signatures. The models used in this research include the Auto Associative Kernel Regression (AAKR) and Multivariate State Estimation Technique (AAMSET) [1, 2]. This type of intrusion detection can be classified as a behavior-based technique, wherein data collected when the system exhibits normal behavior is first used to train and optimize the previously mentioned machine learning models. Any future monitored telemetry that deviates from this normal behavior can be treated as anomalous, and may indicate an attack against the system. Models were tested to evaluate the prognostic effectiveness when monitoring clusters of signals from four classes of telemetry: combination of all telemetry signals, memory and CPU usage, disk usage, and TCP/IP statistics. Anomaly detection is performed by using the Sequential Probability Ratio Test (SPRT), which is a binary sequential statistical test developed by Wald [3]. This test determines whether the monitored observation has mean or variance shifted from defined normal behavior [4]. For the prognostic security experiments reported in this paper, we established rigorous quantitative functional requirements for evaluating the outcome of the intrusion-signature fault injection experiments. These were a high accuracy for model predictions of dynamic telemetry metrics, and ultralow False Alarm and Missed Alarm Probabilities (FAPs and MAPS)...
SimSPRT-II: Monte Carlo Simulation of Sequential Probability Ratio Test Algorithms for Optimal Prognostic Performance
New prognostic AI innovations are being developed, optimized, and productized for enhancing the reliability, availability, and serviceability of enterprise servers and data centers, known as Electronic Prognostics (EP). EP prognostic innovations are now being spun off for prognostic cyber-security applications, and for Internet-of-Things (IoT) prognostic applications in the industrial sectors of manufacturing, transportation, and utilities. For these applications, the function of prognostic anomaly detection is achieved by predicting what each monitored signal “should be” via highly accurate empirical nonlinear nonparametric (NLNP) regression algorithms, and then differencing the optimal signal estimates from the real measured signals to produce “residuals”. The residuals are then monitored with a Sequential Probability Ratio Test (SPRT). The advantage of the SPRT, when tuned properly, is that it provides the earliest mathematically possible annunciation of anomalies growing into time series signals for a wide range of complex engineering applications. SimSPRT-II is a comprehensive parametric monte-carlo simulation framework for tuning, optimization, and performance evaluation of SPRT algorithms for any types of digitized time-series signals. SimSPRT-II enables users to systematically optimize SPRT performance as a multivariate function of Type-I and Type-II errors, Variance, Sampling Density, and System Disturbance Magnitude, and then quickly evaluate what we believe to be the most important overall prognostic performance metrics for real-time applications: Empirical False and Missed-alarm Probabilities (FAPs and MAPs), SPRT Tripping Frequency as a function of anomaly severity, and Overhead Compute Cost as a function of sampling density. SimSPRT-II has become a vital tool for tuning, optimization, and formal validation of SPRT based AI algorithms for applications in a broad range of engineering and security prognostic applications.
SimML Framework: Monte Carlo Simulation of Statistical Machine Learning Algorithms for IoT Prognostic Applications
Advanced statistical machine learning (ML) algorithms are being developed, trained, tuned, optimized, and validated for real-time prognostics for internet-of-things (IoT) applications in the fields of manufacturing, transportation, and utilities. For such applications, we have achieved greatest prognostic success with ML algorithms from a class of pattern recognition known as nonlinear, nonparametric regression. To intercompare candidate ML algorithmics to identify the “best” algorithms for IoT prognostic applications, we use three quantitative performance metrics: false alarm probability (FAP), missed alarm probability (MAP), and overhead compute cost (CC) for real-time surveillance. This paper presents a comprehensive framework, SimML, for systematic parametric evaluation of statistical ML algorithmics for IoT prognostic applications. SimML evaluates quantitative FAP, MAP, and CC performance as a parametric function of input signals’ degree of cross-correlation, signal-to-noise ratio, number of input signals, sampling rates for the input signals, and number of training vectors selected for training. Output from SimML is provided in the form of 3D response surfaces for the performance metrics that are essential for comparing candidate ML algorithms in precise, quantitative terms.
Ruby’s C Extension Problem and How We're Solving It
Ruby’s C extensions have so far been the best way to improve the performance of Ruby code. Ironically, they are now holding performance back, because they expose the internals of Ruby and mean we aren’t free to make major changes to how Ruby works. In JRuby+Truffle we have a radical solution to this problem – we’re going to interpret the source code of your C extensions, like how Ruby interprets Ruby code. Combined with a JIT this lets us optimise Ruby but keep support for C extensions.
Points-To Analysis: Provenance Generation
The usual points-to analysis does not store the justification for the presence of a tuple in the points-to result. However, this is required for many client driven queries as the provenance information provides information to the client which can be used in other contexts such as debugging. In this presentation, we describe our approach to generate provenance information using the results of a context-sensitive points-to analysis. This has been implemented using the DOOP framework and the Souffle Datalog engine. Our uses cases demand that the approach scale to large code-bases. We use four benchmarks derived from two versions of the JDK and use two realistic clients to demonstrate the effectiveness of our approach.
SLIDES: How to Tell a Compiler What We Think We Know?
Slides for an invited keynote talk at 2016 ACM SPLASH-I
Optimizing R Language Execution via Aggressive Speculation
The R language, from the point of view of language design and implementation, is a unique combination of various programming language concepts. It has functional characteristics like lazy evaluation of arguments, but also allows expressions to have arbitrary side effects. Many runtime data structures, for example variable scopes and functions, are accessible and can be modified while a program executes. Several different object models allow for structured programming, but the object models can interact in surprising ways with each other and with the base operations of R. R works well in practice, but it is complex, and it is a challenge for language developers trying to improve on the current state-of-the-art, which is the reference implementation – GNU R. The goal of this work is to demonstrate that, given the right approach and the right set of tools, it is possible to create an implementation of the R language that provides significantly better performance while keeping compatibility with the original implementation. In this paper we describe novel optimizations backed up by aggressive speculation techniques and implemented within FastR, an alternative R language implementation, utilizing Truffle – a JVM-based language development framework developed at Oracle Labs. We also provide experimental evidence demonstrating effectiveness of these optimizations in comparison with GNU R, as well as Renjin and TERR implementations of the R language.
smalltalkCI: A Continuous Integration Framework for Smalltalk Projects
Continuous integration (CI) is a programming practice that reduces the risk of project failure by integrating code changes multiple times a day. This has always been important to the Smalltalk community, so custom integration infrastructures are operated that allow CI testing for Smalltalk projects shared in Monticello repositories or traditional changesets.
In the last few years, the open hosting platform GitHub has become more and more popular for Smalltalk projects. Unfortunately, there was no convenient way to enable CI testing for those projects.
We present smalltalkCI, a continuous integration framework for Smalltalk. It aims to provide a uniform way to load and test Smalltalk projects written in different Smalltalk dialects. smalltalkCI runs on Linux, macOS, and on Windows and can be used locally as well as on a remote server. In addition, it is compatible with Travis CI and AppVeyor, which allows developers to easily set up free CI testing for their GitHub projects without having to run a custom integration infrastructure.
Matriona: Class Nesting with Parameterization in Squeak/Smalltalk
We present Matriona, a module system for Squeak, a Smalltalk dialect. It supports class nesting and parameterization and is based on a hierarchical name lookup mechanism. Matriona solves a range of modularity issues in Squeak. Instead of a flat class organization, it provides a hierarchical namespace, that avoids name clashes and allows for shorter local names. Furthermore, it provides a way to share behavior among classes and modules using mixins and class hierarchy inheritance (a form of inheritance that subclasses an entire class family), respectively. Finally, it allows modules to be externally configurable, which is a form of dependency management decoupling a module from the actual implementation of its dependencies. Matriona is implemented on top of Squeak by introducing a new keyword for run-time name lookups through a reflective mechanism, without modifying the underlying virtual machine. We evaluate Matriona with a series of small applications and will demonstrate how its features can benefit modularity when porting a simple application written in plain Squeak to Matriona.
Optimizing R language execution via aggressive speculation
The R language, from the point of view of language design and implementation, is a unique combination of various programming language concepts. It has functional characteristics like lazy evaluation of arguments, but also allows expressions to have arbitrary side effects. Many runtime data structures, for example variable scopes and functions, are accessible and can be modified while a program executes. Several different object models allow for structured programming, but the object models can interact in surprising ways with each other and with the base operations of R. R works well in practice, but it is complex, and it is a challenge for language developers trying to improve on the current state-of-the-art, which is the reference implementation -- GNU R. The goal of this work is to demonstrate that, given the right approach and the right set of tools, it is possible to create an implementation of the R language that provides significantly better performance while keeping compatibility with the original implementation. In this paper we describe novel optimizations backed up by aggressive speculation techniques and implemented within FastR, an alternative R language implementation, utilizing Truffle -- a JVM-based language development framework developed at Oracle Labs. We also provide experimental evidence demonstrating effectiveness of these optimizations in comparison with GNU R, as well as Renjin and TERR implementations of the R language.
Bringing Low-Level Languages to the JVM: Efficient Execution of LLVM IR on Truffle
Although the Java platform has been used as a multi-language platform, most of the low-level languages (such as C, Fortran, and C++) cannot be executed efficiently on the JVM. We propose Sulong, a system that can execute LLVM- based languages on the JVM. By targeting LLVM IR, Sulong is able to execute C, Fortran, and other languages that can be compiled to LLVM IR. Sulong combines LLVM’s static optimizations with dynamic compilation to reach a peak performance that is near to the performance achievable with static compilers. For C benchmarks, Sulong’s peak runtime performance is on average 1.39x slower (0.79x to 2.45x) compared to the performance of executables compiled by Clang O3. For Fortran benchmarks, Sulong is 2.63 x slower (1.43x to 4.96x) than the performance of executables com- piled by GCC O3. This low overhead makes Sulong an alter-native to Java’s native function interfaces. More importantly, it also allows other JVM language implementations to use Sulong for implementing their native interfaces.
How to Tell a Compiler What We Think We Know?
I have been repeatedly quoted (and tweeted) as having remarked more than once over the last decade, "If it's worth telling yourself (or another programmer), it's worth telling the compiler." In this talk, I will try to explain in more detail what I meant by this. In particular, I have noticed that programming languages provide lots of ways to annnotate one thing, but not very many good ways to talk about relationships among multiple things (other than regard to one as a "server" to which an annotation is attached and the others as "clients"). As a very simple example, we don't even yet have a relatively standard way to say such simple things as "Thus-and-so value is an identity for this binary operation" or "this operation distributes over that operation". Algebraic constraints are one way to express some such constraints, but where in a program should they be placed? How can they be generalized and abstracted? Does object-oriented design make this task easier or harder? I am particularly interested in what we might want to say in the future to a compiler that incorporates a full-blown theorem prover. This talk will be a sort of oral essay, raising more questions than it answers.
Become Polyglot by learning Java!
In a world running at breakneck speed to embrace JavaScript, it is refreshing to see a project that embraces Java to provide a solution that deals with the new world and even improves it. I describe Truffle, a project that aim to build multi-language, multi-tenant, multi-threaded, multi-node, multi-tooling and multi-system environment on top of Java virtual machine with the goal to form the fastest and most flexible execution environment on the planet! Learn about Truffle and its Java APIs to become real polyglot, use the best language for a task and never ask again: Do I really have to use that crummy language?
FastR - Optimizing and Enhancing R Language Implementation
The current reference implementation of the R language, namely GNU R, is very mature and extremely popular. Nevertheless, alternative implementations are under development with the goal of improving and enhancing the current state-of-the-art. FastR is one such implementation created by Oracle Labs in collaboration with academic partners. FastR aims to deliver a fully compatible R language implementation compiling R programs to efficient native code, but which at the same time constitutes an experimentation platform for enhancing some of the existing R capabilities, for example with respect to parallel execution. FastR is built upon an infrastructure consisting of an optimizing compiler called Graal and of Truffle framework which simplifies creation of new language runtimes that can then interface with Graal. The infrastructure is specifically designed to support creation of dynamic languages, such as R, by taking advantage of runtime execution profiling and aggressive optimistic optimizations during the compilation process. In this talk I will describe how the Graal/Truffle infrastructure enables some of the optimizations in FastR's runtime and demonstrate how effective these optimizations are in practice based on an experimental performance evaluation. I will also present our work on enhancing R, in particular with respect to parallel computation capabilities by supplanting GNU R’s process-based model (as defined in the parallel or snowfall packages) with an API-compatible thread-based model where communication between different parts of parallel computation occurs over shared-memory channels.
Polyglot on the JVM with Graal
Polyglot on the JVM with Graal
What Went Wrong? Automatic Triage of Precision Loss During Static Analysis of JavaScript
Static analysis tools tend to have insufficient means to debug a complex notion such as precision, which in our experience leads to time-consuming human analysis. We propose to augment the analysis framework in such a way, so that it keeps track of the loss of precision throughout the analysis. This precision tracking information brings us one step closer to pinpointing the reason why our analysis fails. In this talk, we will detail our motivation for precision tracking and our experience with it, in the context of static analysis with the SAFE framework and aimed at real-world JavaScript applications.
Who reordered my code?!
There is a hidden problem waiting as Ruby becomes 3x faster and starts to support parallel computation - reordering by JIT compilers and CPUs. In this talk, we’ll start by trying to optimize a few simple Ruby snippets. We’ll play the role of a JIT and a CPU and order operations as the rules of the system allow. Then we add a second thread to the snippets and watch it as it breaks horribly. In the second part, we’ll fix the unwanted reorderings by introducing a memory model to Ruby. We’ll discuss in detail how it fixes the snippets and how it can be used to write faster code for parallel execution.
A Tale of Two String Representations
Strings are used pervasively in Ruby. If we can make them faster, we can make many apps faster. In this talk, I will be introducing ropes: an immutable tree-based data structure for implementing strings. While an old idea, ropes provide a new way of looking at string performance and mutability in Ruby. I will describe how we replaced a byte array-oriented string representation with a rope-based one in JRuby+Truffle. Then we’ll look at how moving to ropes affects common string operations, its immediate performance impact, and how ropes can have cascading performance implications for apps.
One Compiler: Deoptimization to Optimized Code
Deoptimization enables speculative compiler optimizations, which are an essential part in nearly every high-performance virtual machine (VM). But it comes with a cost: a separate first-tier interpreter or baseline compiler in addition to the optimizing compiler. Because such a first-tier execution uses a fixed stack frame layout, this affects all VM components that need to walk the stack. We propose to use the optimizing compiler also to compile deoptimization target code, i.e., the non-speculative code where execution continues after a deoptimization. Deoptimization entry points are described with the same scope descriptors used to describe the origin of the deoptimization, i.e., deoptimization is a two-way matching of two scope descriptors describing the same abstract frame. We use this deoptimization approach in a high-performance JavaScript VM written in Java. It strictly uses a one-compiler approach, i.e., all frames on the stack (VM runtime, first-tier execution in an JavaScript AST interpreter, dynamic compilation, deoptimization entry points) originate from the same compiler. Code with deoptimization entry points generated by the optimizing compiler imposes a much smaller overhead than a traditional first-tier execution.
Using LLVM and Sulong for Language C Extensions
Many languages such as Ruby, Python and JavaScript support extension modules written in C, either for speed or to create interfaces to native libraries. Ironically, these extensions can hold back performance of the languages themselves because the native interfaces expose implementation details about how the language was first implemented, such as the layout of data structures. In JRuby+Truffle, an implementation of Ruby, we are using the Sulong LLVM bitcode interpreter to run C extensions on the JVM. By combining LLVM's static optimizations with dynamic compilation, Sulong is fast, but Sulong also gives us a powerful new tool - it allows us to abstract from normal C semantics and to appear to provide the same native API while actually mapping it to our own alternative data structures and implementation. We'll demonstrate Sulong and how we're using it to implement Ruby C extensions.
Frappé Bug Trace Overview Slides for Prof Sukyoung Ryu (KAIST University)
Overview slides of the bug trace extensions to Frappé and the Frappé architecture.
Using Domain-Specific Languages for Analytic Graph Databases
Recently graph has been drawing lots of attention both as a natural data model that captures fine-grained relationships between data entities and as a tool for powerful data analysis that considers such relationships. In this paper, we present a new graph database system that integrates a robust graph storage with an efficient graph analytics engine. Primarily, our system adopts two domain-specific languages, one for describing graph analysis algorithms and the other for graph pattern matching queries. Compared to the API-based approaches in conventional graph processing systems, the DSL-based approach provides users with more flexible and intuitive ways of expressing algorithms and queries. Moreover, the DSL-based approach has significant performance benefits as well, by skipping (remote) API invocation overhead and by applying high-level optimization from the compiler.
Ahead-of-time Compilation of FastR Functions Using Static Analysis
The FastR project delivers high peak-performance through the use of JIT-compilation, but cannot currently provide this performance for methods on first call. This especially affects startup-performance and performance of applications that only call functions once, possibly with large inputs (i.e. data processing). This project presents an approach and the necessary patterns for implementing an AOT-compilation facility within FastR, enabling compilation of call targets just before being first called. The AOT-compilation produces code that has profiling and specialization information tailored to the expected function argument values for the first call, without needing to execute the function in full. The performance results show a clear and unambiguous performance gain for first-call performance of AOT-compiled functions (up to 4x faster, excluding compilation time). Due to constant compilation time there is the potential for overall startup performance improvement for long-running functions even when compilation time is included. While the static analysis itself imposes almost no overhead, compilation times are up to 1.4x higher than with regularly compiled code, due to the inherent imprecision of the current analysis. Although peak performance is reduced, AOT-compilation can be the solution where faster first-call performance, the possibility of offloading/remote execution, and more performance predictability are important.
Adaptive Detection Technique for Cache Based Side Channel Attack using Bloom Filter for Secure Cloud
Security is the one of the main concern in the field of cloud computing. Different users sharing the same physical machines or even software on frequent basis make cloud vulnerable to many security threats. Side channel attacks are the most probable attacks in cloud because of physical resource sharing. In cloud, where multiple Virtual Machines (VM) share same physical machine creates a great opportunity to carry out Cache-based Side Channel Attack (CSCA). In this paper, a novel detection technique using Bloom Filter (BF) for CSCA is designed. This technique treats cache miss sequence as a signature of CSCA and uses a difference mean calculator to generate these signatures. This technique is adaptive, which makes it possible to detect the CSCA with new patterns, which are not observed yet. Bloom filter is used in this technique to reduce the performance overhead to minimum level. The solution is implemented with a cache simulator and proved very effective as it has very less execution time in comparison to the execution time of CSCA.
Self-Specialising Interpreters and Partial Evaluation
Abstract syntax trees are a simple way to represent programs and to implement language interpreters. They can also be an easy way to produce high performance dynamic compilers through combining then with self-specialisation and partial evaluation. Self-specialisation allows the nodes in a program tree to rewrite themselves with more specialised variants in order to increase performance, such as replacing methods calls with inline caches or to replace stronger operations with weaker ones based on profiled types. Partial evaluation can then take this specialised abstract syntax tree and produce optimised machine code based on it. We’ll show how these two techniques work and how they have been implemented by Oracle Labs in Truffle and Graal and used in implementations of languages including JavaScript, C, Ruby, R and more.
Efficient analysis using Soufflé - An experience report
Souffle is an open-source programming framework for static program analysis. It enables the analysis designer to express static program analysis on very large code bases such as a points-to analysis for the Java Development Kit (JDK) which has more than 1.5 million variables and 600 thousand call sites. Souffle employs a Datalog-like language as a domain specific language for static program analysis. Its finite domain semantics lends to efficient execution on parallel hardware using various levels of program specialisations. A specialization hierarchy is applied to a Datalog program. As a result, highly specialized and optimised C++ code is produced that harvests the computational power of modern shared-memory/multi-core computer architectures. We have been using Souffle to explore and develop vulnerability detection analyses on the Java platform, using JDK 7, 8 and 9. These vulnerability detection analyses make use of points-to analysis (reusing parts of the DOOP framework), taint analysis, escape analysis, and other data flow-based analyses. In this talk we report on the types of analyses used, the sizes of the input relations and computed relations, as well as the the runtime and memory requirements for the analyses of such large codebases. For the program specialization, we use several translation steps. In each translation step, new optimisation opportunities open up that would not be able to exploit in the previous translation step. The first translation uses a Futamura projection to translate a declarative Datalog program to an imperative relational program for an abstract machine which we call the Relational Algebra Machine (RAM). The RAM program contains relational algebra operations to compute results produced by clauses, relation management operations to keep track of previous, current and new knowledge in the semi-naive evaluation, and imperative constructs including statement composition for sequencing the operations, and loop construction with loop exit condition to express fixed-points computations for recursively-defined relations. It also has support for parallelism. The next translation steps, translates the optimized RAM program into a C++ program that uses meta-programming techniques with templates. The last translation step, is performed by a C++ program that compiles the C++ program to a executable binary. Operations for emptiness and existence checks, range queries, insertions and unions are highly efficient because portions of the operations are pushed from runtime to compile-time using meta-programming techniques. We now outline some of the novel aspects that are in the implementation of Souffle. The first is related to indices. Since indices are costly, a minimal set of indices for a given relation is desired. We employ a discrete optimization problem to minimize indices creating only the required indices for the execution is required and hence avoiding redundancies. The second is the choice of data-structures to represent large relations...
One Compiler
The stack of a running Java HotSpot VM has stack frames from multiple compilers (the C compiler, the client compiler, and the server compiler) as well as bytecode interpreter stack frames. That complicates essential VM tasks (stack walking, garbage collection, and deoptimization), increases maintenance costs, and makes porting to new hardware architectures difficult. We argue that a single compiler is sufficient: Using the Graal compiler in different configurations, we can execute Java, JavaScript, and many other languages. The stack only contains a single kind of stack frame: frames from ahead-of-time compiled code, interpreter frames (from an ahead-of-time compiled AST interpreter), frames just-in-time compiled code, and deoptimized frames (ahead-of-time compiled code with deoptimization entry points). In this talk, we outline the necessary components of such a streamlined system: deoptimization to compiled frames (in contrast to deoptimization to interpreter frames), access to low-level OS data structures directly from Java, and writing the whole runtime system (including the garbage collector) in Java.
Testing Security Properties in Java
In this paper we describe our initial experience of using mutation testing of Java programs to evaluate the quality of test suites from a security viewpoint. Our focus is on measuring the quality of the test suite associated with the Java Development Kit (JDK) because it provides the core security properties for all applications. We define security-specific mutation operators and determine their usefulness by executing some of the test suites that are publicly available. We summarise our findings and also outline some of the key challenges that remain before mutation testing can be used in practice.
Are We Ready for Secure Languages? (CurryOn presentation)
Language designers and developers want better ways to write good code — languages designed with simpler, more powerful abstractions accessible to a larger community of developers. However, language design does not seem to take into account security, leaving developers with the onerous task of writing attack-proof code. In 20 years, we have gone from 25 reported vulnerabilities to 6,883 vulnerabilities. We see some of the most common vulnerabilities happening in commonly used software — cross-site scripting, SQL injections, and buffer overflows. Attacks are becoming sophisticated, often exploitation three or four weaknesses; making it harder for developers to reason about the source of the problem. I’ll overview some recent attacks and argue our languages must take security seriously. Languages need security-oriented constructs, and compiler must let developers know when there is a problem with their code. We need to empower developers with the concept of “security for the masses” by making available languages that do not necessarily require an expert in order to determine whether the code being written is vulnerable to attack or not.
Efficient and Thread-Safe Objects for Dynamically-Typed Languages
We are in the multi-core era. Dynamically-typed languages are in widespread use, but their support for multithreading still lags behind. One of the reasons is that the sophisticated techniques they use to efficiently represent their dynamic object models are often unsafe in multithreaded environments. This paper defines safety requirements for dynamic object models in multithreaded environments. Based on these requirements, a language-agnostic and thread-safe object model is designed that maintains the efficiency of sequential approaches. This is achieved by ensuring that field reads do not require synchronization and field updates only need to synchronize on objects shared between threads. Basing our work on JRuby+Truffle, we show that our safe object model has zero overhead on peak performance for thread-local objects and only 3% average overhead on parallel benchmarks where field updates require synchroniza- tion. Thus, it can be a foundation for safe and efficient multithreaded VMs for a wide range of dynamic languages.
Gems: shared-memory parallel programming for Node.JS
JavaScript is the most popular programming language for client-side Web applications, and Node.js has popularized the language for server-side computing, too. In this domain, the minimal support for parallel programming remains however a major limitation. In this paper we introduce a novel parallel programming abstraction called Generic Messages (GEMS). GEMS allow one to combine message passing and shared memory parallelism, extending the classes of parallel applications that can be built with Node.js. GEMS have customizable semantics and enable several forms of thread safety, isolation, and concurrency control. GEMS are designed as convenient JavaScript abstractions that expose high-level and safe parallelism models to the developer. Experiments show that GEMS outperform equivalent Node.js applications thanks to their usage of shared memory.
Are We Ready For Secure Languages? (CurryOn slides)
Language designers and developers want better ways to write good code — languages designed with simpler, more powerful abstractions accessible to a larger community of developers. However, language design does not seem to take into account security, leaving developers with the onerous task of writing attack-proof code. In 20 years, we have gone from 25 reported vulnerabilities to 6,883 vulnerabilities. We see some of the most common vulnerabilities happening in commonly used software — cross-site scripting, SQL injections, and buffer overflows. Attacks are becoming sophisticated, often exploitation three or four weaknesses; making it harder for developers to reason about the source of the problem. I’ll overview some recent attacks and argue our languages must take security seriously. Languages need security-oriented constructs, and compiler must let developers know when there is a problem with their code. We need to empower developers with the concept of “security for the masses” by making available languages that do not necessarily require an expert in order to determine whether the code being written is vulnerable to attack or not.
Toward a More Carefully Specified Metanotation
POPL is known for, among other things, papers that present formal descriptions and rigorous analyses of programming languages. But an important language has been neglected: the \emph{metanotation} of inference rules and BNF that has been used in over 40\% of all POPL papers to describe all the other programming languages. This metanotation is not completely described in any one place; rather, it is a folk language that has grown over the years, as paper after paper tries out variations and extensions. We believe that it is high time that the tools of the POPL trade be applied to the tools themselves. Examination of many POPL papers suggests that as the metanotation has grown, it has diversified to the point that problems are surfacing: different notations are in use for the same operation (substitution); the same notation is in use for different operations; and in some cases, notations for repetition are ambiguous, or require the reader to apply knowledge of semantics to interpret the syntax. All three problems present substantial potential for confusion. No individual paper is at fault; rather, this is the natural result of language growth in a community, producing incompatible dialects. We back these claims by presenting statistics from a survey of all past POPL papers, 1973--2016, and examples drawn from those papers. We propose a set of design principles for metanotation, and then propose a specific version of the metanotation that can be always interpreted in a purely formal, syntactic manner and yet is reasonably compatible with past use. Our goal is to lay a foundation for complete formalization and mechanization of the metanotation.
FastR presentation at RIOT 2016 workshop
This is the presentation about FastR at the RIOT 2016 workshop (which is organized by us). The audience consists of members of the core R group, developers of other implementations of the R language, and people developing tooling for R. The main focus of our presence in this workshop is to build credibility and show that we know what we're doing. The contents of this presentation are a combination of the usual Truffle interoperability and Graal introduction, a bit of compiler 101 (the audience does not have a CC background), some bits from our recent paper (2016-0523) and the presentation I gave at useR! (2016-0540).
High-performance R with FastR
R is a highly dynamic language that employs a unique combination of data type immutability, lazy evaluation, argument matching, large amount of built-in functionality, and interaction with C and Fortran code. While these are straightforward to implement in an interpreter, it is hard to compile R functions to efficient bytecode or machine code. Consequently, applications that spend a lot of time in R code often have performance problems. Common solutions are to try to apply primitives to large amounts of data at once and to convert R code to a native language like C. FastR is a novel approach to solving R’s performance problem. It makes extensive use of the dynamic optimization features provided by the Truffle framework to remove the abstractions that the R language introduces, and can use the Graal compiler to create optimized machine code on the fly. This talk introduces FastR and the basic concepts behind Truffle’s optimization features. It provides examples of the language constructs that are particularly hard to implement using traditional compiler techniques, and shows how to use FastR to improve performance without compromising on language features.
Zero-Overhead Integration of R, JS, Ruby and C/C++
Presentation about FastR and language interoperability at the useR! 2016 conference.
Audio/Video recording of "Zero-Overhead Integration of R, JS, Ruby and C/C++"
Presentation about FastR and language interoperability at the useR! 2016 conference. Stanford is asking for permission to record presentations and publish those recordings.
EPA: A Precise and Scalable Object-Sensitive Points-to Analysis for Large Programs
Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. There are several kinds of points-to analyses that trade-off precision with runtime. For object-oriented languages including Java, ``context-sensitivity'' is key to obtain sufficient precision. A context may be parameterizable, and may consider calls, objects, types for its construction. Although points-to analysis research has received a lot of attention in the past, scaling object-sensitive points-to analysis for large Java code bases still remains an open research challenge. In this paper, we develop an Eclectic Points-To Analysis (EPA) framework that computes an efficient, selective, object-sensitive points-to analysis that is client independent. This framework parameterizes context sensitivities for different allocation sites in the program. The level of required sensitivity is determined by a preanalysis. We have implemented our approach using Souffle (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs including OpenJDKand Jython show that our technique is efficient and highly precise. For the OpenJDK, an instance of the EPA-based analysis reduces 27% of runtime for a slight loss of precision, while for Jython, the same analysis reduces 82% of runtime for almost no loss of precision.
Model Checking Cache Coherence in System-Level Code
Cache coherence is a key consistency requirement between the shared main memory and individual caches for a multiprocessor framework. Several months ago, we started a project to verify the cache coherence of a system-level C codebase (50,000+ lines), which runs in an environment that does not provide hardware-level guarantees, requiring programmers to ensure correct cache coherence manually through explicit FLUSH and INVALIDATE operations. After initial evaluation and comparison of many model checking tools, we believe that SPIN is the most suitable one. However, pure model checking is not sufficiently scalable to verify such a large codebase. Therefore, we are currently investigating a hybrid model checking solution with some static analysis techniques to reduce the model size via abstraction and program slicing, and restrict the interleavings explored. In this talk, we will share our model checking experiences. In particular, we will discuss (1) our evaluation of different model checking tools, (2) the Promela model we use to verify the cache coherence, (3) initial model checking experience for verifying the coherence in concurrent quicksort algorithm, and (4) the automatic model extraction from large codebase in C.
Fortress Features and Lessons Learned
Slides for an invited keynote talk on June 22, 2016, at the 2016 JuliaCon conference to be held at MIT at the Stata Center. This is an overview of the Fortress programming language, with some comparison to Scala. Many of the slides are taken from two previously approved slide sets (Archivist 2012-0104 and 2012-0284), but some have been updated, and some new slides have been created.
Truffle Tutorial: One VM to Rule Them All
Forget “this language is fast”, “this language has the libraries I need”, and “this language has the tool support I need”. The Truffle framework for implementing managed languages in Java gives you native performance, multi-language integration with all other Truffle languages, and tool support - all of that by just implementing an abstract syntax tree (AST) interpreter in Java. Truffle applies AST specialization during interpretation, which enables partial evaluation to create highly optimized native code without the need to write a compiler specifically for a language. The Java VM contributes high-performance garbage collection, threads, and parallelism support. This tutorial is both for newcomers who want to learn the basic principles of Truffle, and for people with Truffle experience who want to learn about recently added features. It presents the basic principles of the partial evaluation used by Truffle and the Truffle DSL used for type specializations, as well as features that were added recently such as the language-agnostic object model, language integration, and debugging support. Oracle Labs and external research groups have implemented a variety of programming languages on top of Truffle, including JavaScript, Ruby, R, Python, and Smalltalk. Several of them already exceed the best implementation of that language that existed before.
Unifying Access Control & Information Flow: A Security Model for Programs Consisting of Trusted and Untrusted Code
We introduce a security model based on dual access control labels (called DAC) that enables to have both confidentiality and integrity in the same program. This is developed in the context of object-oriented languages and considers implicit flows arising from both branching as well dynamic dispatch. Our DAC model overcomes the limitations of the classical access control models such as those based on stack inspection. Our security model is, in general, neither transitive nor reflexive and it considers both confidentiality and integrity. Traditional lattice-based security models are a special case for our security model. We show that our model satisfies a non-interference theorem. The theorem simultaneously guarantees a) from a confidentiality perspective, an attacker cannot distinguish the low level values associated with two computations that have the different high level inputs b) from an integrity perspective, an attacker cannot distinguish the high level values associated with two computations that have different low level inputs. We also show that one can give the necessary security guarantees via a static program analysis.
Sulong: Memory Safe and Efficient Execution of LLVM-Based Languages
Memory errors in C/C++ can allow an attacker to read sensitive data, corrupt the memory, or crash the executing process. The renowned top 25 of most dangerous software errors as published by the SANS Institute, as well as recent security disasters such as Heartbleed show how important it is to tackle memory safety for C/C++. We present Sulong, an efficient interpreter for LLVM-based languages that runs on the JVM. Sulong guarantees memory safety for C/C++ and other LLVM-based languages by using managed allocations and automatic memory management. Through dynamic compilation, Sulong will achieve peak performance close to state of the art compilers such as GCC or Clang, which do not produce memory-safe code. By efficiently implementing memory safety, Sulong strives to be a real-world solution for mitigating software security problems.
Sulong - Execution of LLVM-Based Languages on the JVM
For the last decade, the Java Virtual Machine (JVM) has been a popular platform to host languages other than Java. Language implementation frameworks like Truffle allow the implementation of dynamic languages such as JavaScript or Ruby with competitive performance and completeness. However, statically typed languages are still rare under Truffle. We present Sulong, an LLVM IR interpreter that brings all LLVM-based languages including C, C++, and Fortran in one stroke to the JVM. Executing these languages on the JVM enables a wide area of future research, including high-performance interoperability between high-level and low-level languages, combination of static and dynamic optimizations, and a memory-safe execution of otherwise unsafe and unmanaged languages.
An Experience Report: Efficient Analysis using Souffle
This abstract summarizes the key aspects of Souffle, which is an open-source Datalog engine used for static program analysis. It describes the overall approach of translating Datalog to C++ using an abstract machine and staged compilation. The novel aspects in Souffle include auto-index generation, representation of large relations, and techniques to exploit caches and parallel cores. It also identifies the issues of query planning and improved parallelism that need further exploration. The presentation will also include our experience in using Souffle in the context of vulnerability detection using points-to and other data flow based analyses.
ICT 3612/7204 Database Systems - Graph Databases
Slides for a lecture on Graph Databases at Griffith University (Brisbane, Australia) for third year undergraduate course on Database Management (ICT 3612/7204).
Specializing Ropes for Ruby
Ropes are a data structure for representing character strings via a binary tree of operation-labeled nodes. Both the nodes and the trees constructed from them are immutable, making ropes a persistent data structure. Ropes were designed to perform well with large strings, and in particular, concatenation of large strings. We present our findings in using ropes to implement mutable strings in JRuby+Truffle, an implementation of the Ruby programming language using a self-specializing abstract syntax tree interpreter and dynamic compilation. We extend ropes to support Ruby language features such as encodings and refine operations to better support typical Ruby programs. We also use ropes to work around underlying limitations of the JVM platform in representing strings. Finally, we evaluate the performance of our implementation of ropes and demonstrate that they perform 0.9x – 9.4x as fast as byte array-based string representations in representative benchmarks.
Combining speculative optimizations with flexible scheduling of side-effects
Speculative optimizations allow compilers to optimize code based on assumptions that cannot be verified at compile time. Taking advantage of the specific run-time situation opens more optimization possibilities. Speculative optimizations are key to the implementation of high-performance language runtimes. Using them requires cooperation between the just-in-time compilers and the runtime system and influences the design and the implementation of both. New speculative optimizations as well as their application in more dynamic languages are using these systems much more than current implementations were designed for. We first quantify the run time and memory footprint caused by their usage. We then propose a structure for compilers that separates the compilation process into two stages. It helps to deal with this issues without giving up on other traditional optimizations. In the first stage, floating guards can be inserted for speculative optimizations. Then the guards are fixed in the control-flow at appropriate positions. In the second stage, side-effecting instructions can be moved or reordered. Using this framework we present two optimizations that help reduce the run-time costs and the memory footprint. We study the effects of both stages as well as the effects of these two optimizations in the Graal compiler. We evaluate this on classical benchmarks targeting the JVM: SPECjvm2008, DaCapo and Scala-DaCapo. We also evaluate JavaScript benchmarks running on the Truffle platform that uses the Graal compiler. We find that combining both stages can bring up to 84% improvement in performance (9% on average) and our optimization of memory footprint can bring memory usage down by 27% to 92% (45% on average).
Minimally Constrained Multilingual Word Embeddings via Artificial Code Switching
We present a method that consumes a large corpus of multilingual text and produces a single, unified word embedding in which the word vectors generalize across languages. In contrast to current approaches that require language identification, our method is agnostic about the languages with which the documents in the corpus are expressed, and does not rely on parallel corpora to constrain the spaces. Instead we utilize a small set of human provided word translations— which are often freely and readily available. We can encode such word translations as hard constraints in the model’s objective functions; however, we find that we can more naturally constrain the space by allowing words in one language to borrow distributional statistics from context words in another language. We achieve this via a process we term artificial code-switching. As the name suggests, we induce code switching so that words across multiple languages appear in contexts together. Not only do embedding models trained on code-switched data learn common cross-lingual structure, the common structure allows an NLP model trained in a source language to generalize to multiple target languages (achieving up to 80% of the accuracy of models trained with target language data).
Nesoi: compile time checking of transactional coverage in parallel programs.
In this paper we describe our implementation of Nesoi, a tool for static checking the transactional requirements of a program. Nesoi categorizes the fields of each instance of an object in the program and reports missing and unrequired transactions at compile time. As transactional requirements are detected at the level of object fields in independent object instances the fields that need to be considered for possible collisions in a transaction can be cleanly identified, reducing the possibility of false collisions. Running against a set of benchmarks these fields account for just 2.5% of reads and 17-31% of writes within a transaction. Nesoi is constructed as a plugin for the Scala compiler and is integrated with the dataflow libraries used in the Teraflux project to providing support both for conventional programming modes and the dataflow + transactions model of the Teraflux project.
Attribute Extraction from Noisy Text Using Character-based Sequence Tagging Models
Attribute extraction is the problem of extracting structured key-value pairs from unstructured data. Many similar entity recognition problems are usually solved as a sequence labeling task in which elements of the sequence are word tokens. While word tokens are suitable for newswire, for many types of data—from social media text to product descriptions–word tokens are problematic because simple regular-expression based word tokenizers can not accurately tokenize text that is inconsistently spaced. Instead, we propose a character-based sequence tagging approach that jointly tokenizes and tags tokens. We find that the character-based approach is surprisingly accurate both at tokenizing words, and at inferring labels. We also propose an end-to-end system that uses pair- wise entity linking models for normalizing the extracted values.
Minimally Constrained Multilingual Word Embeddings via Artificial Code Switching
We present a method that consumes a large corpus of multilingual text and produces a single, unified word embedding in which the word vectors generalize across languages. Our method is agnostic about the languages with which the documents in the corpus are expressed, and does not rely on parallel corpora to constrain the spaces. Instead we utilize a small set of human provided word translations to artificially induce code switching; thus, allowing words in multiple languages to appear in contexts together and share distributional information. We evaluate the embeddings on a new multilingual word analogy dataset. We also find that our embeddings allow an NLP model trained in one language to generalize to another, achieving up to 80% of the accuracy of an in-language model.
An Efficient and Generic Event-based Profiler Framework for Dynamic Languages
Profilers help programmers analyze their programs and identify performance bottlenecks. We implement a profiler framework that helps to compare and analyze the programs implementing the same algorithms written in different languages. Profiler implementers replicate common functionalities in their language profiler. We focus on building a generic profiler framework for dynamic languages to minimize the recurring implementation effort. We implement our profiler in a framework that optimizes abstract syntax tree (AST) interpreters using a just-in-time (JIT) compiler. We evaluate it on ZipPy and JRuby+Truffle, Python and Ruby implementations in this framework, respectively. We show that our profiler runs faster than the existing profilers in these languages and requires modest implementation effort. Our profiler serves three purposes: 1) helps users to find the bottlenecks in their programs, 2) helps language implementers to improve the performance of their language implementation, 3) helps to compare and evaluate different languages on cross-language benchmarks.
Breaking Payloads with Runtime Code Stripping and Image Freezing
Fighting off attacks based on memory corruption vulnerabilities is hard and a lot of research was and is conducted in this area. In our recent work we take a different approach and looked into breaking the payload of an attack. Current attacks assume that they have access to every piece of code and the entire platform API. In this talk, we present a novel defensive strategy that targets this assumption. We built a system that removes unused code from an application process to prevent attacks from using code and APIs that would otherwise be present in the process memory but normally are not used by the actual application. Our system is only active during process creation time, and, therefore, incurs no runtime overhead and thus no performance degradation. Our system does not modify any executable files or shared libraries as all actions are executed in memory only. We implemented our system for Windows 8.1 and tested it on real world applications. Besides presenting our system we also show the results of our investigation into code overhead present in current applications.
Shoal: smart allocation and replication of memory for parallel programs
Modern NUMA multi-core machines exhibit complex latency and throughput characteristics, making it hard to allocate memory optimally for a given program’s access patterns. However, sub-optimal allocation can significantly impact performance of parallel programs. We present an array abstraction that allows data placement to be automatically inferred from program analysis, and implement the abstraction in Shoal, a runtime library for parallel programs on NUMA machines. In Shoal, arrays can be automatically replicated, distributed, or partitioned across NUMA domains based on annotating memory allocation statements to indicate access patterns. We further show how such annotations can be automatically provided by compilers for high-level domainspecific languages (for example, the Green-Marl graph language). Finally, we show how Shoal can exploit additional hardware such as programmable DMA copy engines to further improve parallel program performance. We demonstrate significant performance benefits from automatically selecting a good array implementation based on memory access patterns and machine characteristics. We present two case-studies: (i) Green-Marl, a graph analytics workload using automatically annotated code based on information extracted from the highlevel program and (ii) a manually-annotated version of the PARSEC Streamcluster benchmark.
Building Debuggers and Other Tools: We Can “Have it All” (Position Paper)
Software development tools that “instrument” running programs, notably debuggers, are presumed to demand difficult tradeoffs among performance, functionality, implementation complexity, and user convenience. A fundamental change in our thinking about such tools makes that presumption obsolete. By building instrumentation directly into the core of a high-performance language implementation framework, tool support can be always on, with confidence that optimization will apply uniformly to instrumentation and result in near zero overhead. Tools can be always available (and fast), not only for end user programmers, but also for language implementors throughout development.
Snippets: Taking the High Road to a Low Level
When building a compiler for a high-level language, certain intrinsic features of the language must be expressed in terms of the resulting low-level operations. Complex features are often expressed by explicitly weaving together bits of low-level IR, a process that is tedious, error prone, difficult to read, difficult to reason about, and machine dependent. In the Graal compiler for Java, we take a different approach: we use snippets of Java code to express semantics in a high-level, architecture-independent way. Two important restrictions make snippets feasible in practice: they are compiler specific, and they are explicitly prepared and specialized. Snippets make Graal simpler and more portable while still capable of generating machine code that can compete with other compilers of the Java HotSpot VM.
Trash Day: Coordinating Garbage Collection in Distributed Systems
Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a signifi- cant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications. In this paper, we show that distributed applications suffer from each node’s language runtime system making GC-related decisions independently. We first demonstrate this problem on two widely-used systems (Apache Spark and Apache Cassandra). We then propose solving this problem using a Holistic Runtime System, a distributed language runtime that collectively manages runtime services across multiple nodes. We present initial results to demonstrate that this Holistic GC approach is effective both in reducing the impact of GC pauses on a batch workload, and in improving GC-related tail-latencies in an interactive setting.
Making meaningful decisions about time, workload and pedagogy in the digital age: the Course Resource Appraisal Model
This article reports on a design-based research project to create a modelling tool to analyse the costs and learning benefits involved in different modes of study. The Course Resource Appraisal Model (CRAM) provides accurate cost-benefit information so that institutions are able to make more meaningful decisions about which kind of courses—online, blended or traditional face-to-face—make sense for them to provide. The tool calculates the difference between expenses and income over three iterations of the course and presents a pedagogical analysis of the learning experience provided. The article draws on a CRAM analysis of the costs and learning benefits of a massive open online course to show how the tool can illuminate the pedagogical and financial viability of a course of this kind.
Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems.
To harness the compute resource of many-core system with tens to hundreds of cores, applications have to expose parallelism to the hardware. Researchers are aggressively looking for program execution models that make it easier to expose parallelism and use the available resources. One common approach is to decompose a program into parallel `tasks' and allow an underlying system layer to schedule these tasks to different threads. Software-only schedulers can implement various scheduling policies and algorithms that match the characteristics of different applications and programming models. Unfortunately with large-scale multi-core systems, software schedulers suffer significant overheads as they synchronize and communicate task information over deep cache hierarchies. To reduce these overheads, hardware-only schedulers like Carbon have been proposed to enable task queuing and scheduling to be done in hardware. This paper presents a hardware scheduling approach where the structure provided to programs by task-based programming models can be incorporated into the scheduler, making it aware of a task's data requirements. This prior knowledge of a task's data requirements allows for better task placement by the scheduler which result in a reduction in overall cache misses and memory traffic, improving the program's performance and power utilization. Simulations of this technique for a range of synthetic benchmarks and components of real applications have shown a reduction in the number of cache misses by up to 72 and 95 {\%} for the L1 and L2 caches, respectively, and up to 30 {\%} improvement in overall execution time against FIFO scheduling. This results not only in faster execution and in less data transfer with reductions of up to 50 {\%}, allowing for less load on the interconnect, but also in lower power consumption.
Shelf space product placement optimizer
A system for optimizing shelf space placement for a product receives decision variables and constraints, and executes a Randomized Search (“RS”) using the decision variables and constraints until an RS solution is below a pre-determined improvement threshold. The system then solves a Mixed-Integer Linear Program (“MILP”) problem using the decision variables and constraints, and using the RS solution as a starting point, to generate a MILP solution. The system repeats the RS executing and MILP solving as long as the MILP solution is not within a predetermined accuracy or does not exceed a predetermined time duration. The system then, based on the final MILP solution, outputs a shelf position and a number of facings for the product.
Java-to-JavaScript translation via structured control flow reconstruction of compiler IR
We present an approach to cross-compile Java bytecodes to Java-Script, building on existing Java optimizing compiler technology. Static analysis determines which Java classes and methods are reachable. These are then translated to JavaScript using a re-configured Java just-in-time compiler with a new back end that generates JavaScript instead of machine code. Standard compiler optimizations such as method inlining and global value numbering, as well as advanced optimizations such as escape analysis, lead to compact and optimized JavaScript code. Compiler IR is unstructured, so structured control flow needs to be reconstructed before code generation is possible. We present details of our control flow reconstruction algorithm. Our system is based on Graal, an open-source optimizing compiler for the Java HotSpot VM and other VMs. The modular and VM-independent architecture of Graal allows us to reuse the intermediate representation, the bytecode parser, and the high-level optimizations. Our custom back end first performs control flow reconstruction and then JavaScript code generation. The generated JavaScript undergoes a set of optimizations to increase readability and performance. Static analysis is performed on the Graal intermediate representation as well. Benchmark results for medium-sized Java benchmarks such as SPECjbb2005 run with acceptable performance on the V8 JavaScript VM.
Augur: Data-Parallel Probabilistic Modelling
Implementing inference procedures for each new probabilistic model is time-consuming and error-prone. Probabilistic programming addresses this problem by allowing a user to specify the model and automatically generating the inference procedure. To make this practical it is important to generate high performance inference code. In turn, on modern architectures, high performance implies parallel execution. In this paper we present Augur, a probabilistic modelling language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. We show that the compiler can generate data-parallel inference code scalable to thousands of GPU cores by making use of the conditional independence relationships in the Bayesian network.
Generalized decomposition for non-linear problems
Brief descriptions of RD applications to min cut, max cut, QAP, quadratic programming, and Rosenbrock Functions.
Supporting Maintenance and Evolution of Access Control Models in Web Applications
This paper presents an approach to support the maintenance and evolution of Role-Based Access Control (RBAC) models with reverse-engineered Secure UML models. Starting from the Policy Decision Points (PDP) and Policy Enforcement Points (PEP) of an application, our approach statically reverse-engineers the implemented Secure UML model of an application. The secure UML model is then stored in an RDF triple store for easy querying and exploration. In the context of this study, we extracted the Secure UML model of the GRAND Forum, a web-based forum for the members of the GRAND (Graphics, Animation and New Media) NCE (Networks of Centers of Excellence), that is developed and maintained at the University of Alberta. Using three real use-case scenarios, we illustrate how simple queries to the extracted Secure UML can save developers significant amounts of manual work and support them in their access control related maintenance and evolution tasks.
Why Inheritance Anomaly Is Not Worth Solving
Modern computers improve their predecessors with additional parallelism but require concurrent software to exploit it. Object-orientation is instrumental in simplifying sequential programming, however, in a concurrent setting, programmers adding new methods in a subclass typically have to modify the code of the superclass, which inhibits reuse, a problem known as inheritance anomaly. There have been much efforts by researchers in the last two decades to solve the problem by deriving anomaly-free languages. Yet, these proposals have not ended up as practical solutions, thus one may ask why. In this article, we investigate from a theoretical perspective if a solution of the problem would introduce extra code complexity. We model object behavior as a regular language, and show that freedom from inheritance anomaly necessitates a language where ensuring Liskov-Wing substitutability becomes a language containment problem, which in our modeling is PSPACE hard. This indicates that we cannot expect programmers to manually ensure that subtyping holds in an anomaly-free language. Anomaly freedom thus predictably leads to software bugs and we doubt the value of providing it. From the practical perspective, the problem is already solved. Inheritance anomaly is part of the general fragile base class problem of object-oriented programming, that arises due to code coupling in implementation inheritance. In modern software practice, the fragile base class problem is circumvented by interface abstraction to avoid implementation inheritance, and opting for composition as means for reuse. We discuss concurrent programming issues with composition for reuse.
Debugging At Full Speed
Debugging support for highly optimized execution environments is notoriously difficult to implement. The Truffle/Graal platform for implementing dynamic languages offers an opportunity to resolve the apparent trade-off between debugging and high performance. Truffle/Graal-implemented languages are expressed as abstract syntax tree (AST) interpreters. They enjoy competitive performance through platform support for type specialization, partial evaluation, and dynamic optimization/deoptimization. A prototype debugger for Ruby, implemented on this platform, demonstrates that basic debugging services can be implemented with modest effort and without significant impact on program performance. Prototyped functionality includes breakpoints, both simple and conditional, at lines and at local variable assignments. The debugger interacts with running programs by inserting additional nodes at strategic AST locations; these are semantically transparent by default, but when activated can observe and interrupt execution. By becoming in effect part of the executing program, these “wrapper” nodes are subject to full runtime optimization, and they incur zero runtime overhead when debugging actions are not activated. Conditions carry no overhead beyond evaluation of the expression, which is optimized in the same way as user code, greatly improving the prospects for capturing rarely manifested bugs. When a breakpoint interrupts program execution, the platform automatically restores the full execution state of the program (expressed as Java data structures), as if running in the unoptimized AST interpreter. This then allows full introspection of the execution data structures such as the AST and method activation frames when in the interactive debugger console. Our initial evaluation indicates that such support could be permanently enabled in production environments.
Exploiting Implicit Parallelism in Dynamic Array Programming Languages
We have built an interpreter for the array programming language J. The interpreter exploits implicit data parallelism in the language to achieve good parallel speedups on a variety of benchmark applications. Many array programming languages operate on entire arrays without the need to write loops. Writing without loops simplifies the programs. Array programs without loops allow an interpreter to parallelize the execution of the code without complex analysis or input from the programmer. The J programming language includes the usual idioms of operations on arrays of the same size and shape, where the operations can often be performed in parallel for each individual item of the operands. Another opportunity comes from Js reduction operations, where suitable operations can be performed in parallel for all the items of an operand. J has a notion of verb rank, which allows programmers to simplify programs by declaring how operations are applied to operands. The verb rank mechanism allows us to extract further parallelism. Our implementation of an implicitly parallelizing interpreter for J is written entirely in Java. We have written the interpreter in a framework that produces native code for the interpreter, giving good scalar performance. The interpreter itself is responsible for exploiting the parallelism available in the applications. Our results show we attain good parallel speed-up on a variety of benchmarks, including near perfect linear speed-up on inherently parallel benchmarks. We believe that the lessons learned from our approach to exploiting data parallelism in an interpreter can be applied to other interpreted languages as well.
Towards Whatever-Scale Abstractions for Data-Driven Parallelism
Increasing diversity in computing systems often requires problems to be solved in quite different ways depending on the workload, data size, and resources available. This kind of diversity is becoming increasingly broad in terms of the organization, communication mechanisms, and the performance and cost characteristics of individual machines and clusters. Researchers have thus been motivated to design abstractions that allow programmers to express solutions independently of target execution platforms, enabling programs to scale from small shared memory systems to distributed systems comprising thousands of processors.We call these abstractions “Whatever-Scale Computing”. In prior work, we have found data-driven parallelism to be a promising approach for solving many problems on shared memory machines. In this paper, we describe ongoing work towards extending our previous abstractions to support data-driven parallelism for Whatever-Scale Computing.We plan to target rack-scale distributed systems. As an intermediate step, we have implemented a runtime system that treats a NUMA shared memory system as if each NUMA domain were a node in a distributed system, using shared memory to implement communication between nodes.
Partial Escape Analysis and Scalar Replacement for Java
Escape Analysis allows a compiler to determine whether an object is accessible outside the allocating method or thread. This information is used to perform optimizations such as Scalar Replacement, Stack Allocation and Lock Elision, allowing modern dynamic compilers to remove some of the abstractions introduced by advanced programming models. The all-or-nothing approach taken by most Escape Analysis algorithms prevents all these optimizations as soon as there is one branch where the object escapes, no matter how unlikely this branch is at runtime. This paper presents a new, practical algorithm that performs control flow sensitive Partial Escape Analysis in a dynamic Java compiler. It allows Escape Analysis, Scalar Replacement and Lock Elision to be performed on individual branches. We implemented the algorithm on top of Graal, an open-source Java just-in-time compiler, and it performs well on a diverse set of benchmarks. In this paper, we evaluate the effect of Partial Escape Analysis on the DaCapo, ScalaDaCapo and SpecJBB2005 benchmarks, in terms of run-time, number and size of allocations and number of monitor operations. It performs particularly well in situations with additional levels of abstraction, such as code generated by the Scala compiler. It reduces the amount of allocated memory by up to 58.5%, and improves performance by up to 33%.
One VM to rule them all
Building high-performance virtual machines is a complex and expensive undertaking; many popular languages still have low-performance implementations. We describe a new approach to virtual machine (VM) construction that amortizes much of the effort in initial construction by allowing new languages to be implemented with modest additional effort. The approach relies on abstract syntax tree (AST) interpretation where a node can rewrite itself to a more specialized or more general node, together with an optimizing compiler that exploits the structure of the interpreter. The compiler uses speculative assumptions and deoptimization in order to produce efficient machine code. Our initial experience suggests that high performance is attainable while preserving a modular and layered architecture, and that new high-performance language implementations can be obtained by writing little more than a stylized interpreter.
An intermediate representation for speculative optimizations in a dynamic compiler
We present a compiler intermediate representation (IR) that allows dynamic speculative optimizations for high-level languages. The IR is graph-based and contains nodes fixed to control flow as well as floating nodes. Side-effecting nodes include a framestate that maps values back to the original program. Guard nodes dynamically check assumptions and, on failure, deoptimize to the interpreter that continues execution. Guards implicitly use the framestate and program position of the last side-effecting node. Therefore, they can be represented as freely floating nodes in the IR. Exception edges are modeled as explicit control flow and are subject to full optimization. We use profiling and deoptimization to speculatively reduce the number of such edges. The IR is the core of a just-in-time compiler that is integrated with the Java HotSpot VM. We evaluate the design decisions of the IR using major Java benchmark suites.
An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler
We present a compiler intermediate representation (IR) that allows dynamic speculative optimizations for high-level languages. The IR is graph-based and contains nodes fixed to control flow as well as floating nodes. Side-effecting nodes include a framestate that maps values back to the original program. Guard nodes dynamically check assumptions and, on failure, deoptimize to the interpreter that continues execution. Guards implicitly use the framestate and program position of the last side-effecting node. Therefore, they can be represented as freely floating nodes in the IR. Exception edges are modeled as explicit control flow and are subject to full optimization. We use profiling and deoptimization to speculatively reduce the number of such edges. The IR is the core of a just-in-time compiler that is integrated with the Java HotSpot VM. We evaluate the design decisions of the IR using major Java benchmark suites.
Assessing Confidence of Knowledge Base Content with an Experimental Study in Entity Resolution
The purpose of this paper is to begin a conversation about the importance and role of confidence estimation in knowledge bases (KBs). KBs are never perfectly accurate, yet without confidence reporting their users are likely to treat them as if they were, possibly with serious real-world consequences. We define a notion of confidence based on the probability of a KB fact being true. For automatically constructed KBs we propose several algorithms for estimating this confidence from pre-existing probabilistic models of data integration and KB construction. In particular, this paper focusses on confidence estimation in entity resolution. A goal of our exposition here is to encourage creators and curators of KBs to include confidence estimates for entities and relations in their KBs.
A Joint Model for Discovering and Linking Entities
Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom declares that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a postprocessing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly model entity-linking and entity-discovery as a single entity resolution problem. In order to achieve scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that the joint approach to coreference is substantially more accurate than traditional entity-linking, reducing error by over 75\%.
Improved dataflow executions with user assisted scheduling.
In pure dataflow applications scheduling can have a huge effect on the memory footprint and number of active tasks in the program. However, in impure programs, scheduling not only effects the system resources, but can also effect the overall time complexity and accuracy of the program. To address both of these aspects this paper describes and analyses effective extensions to a dataflow scheduler to allow programmers to provide priority information describing the preferred execution order of a dataflow graph. We demonstrate that even very crude task priority metrics can be extremely effective, providing an average saving of 91% over the worst case scenario and 60% over the best case naive scenario. We also note that by specifying the scheduling information explicitly based on the algorithm, not the hardware, we provide portability to the application.
An Experimental Study of the Influence of Dynamic Compiler Optimizations on Scala Performance
Java Virtual Machines are optimized for performing well on traditional Java benchmarks, which consist almost exclusively of code generated by the Java source compiler (javac). Code generated by compilers for other languages has not received nearly as much attention, which results in performance problems for those languages. One important specimen of "another language" is Scala, whose syntax and features encourage a programming style that differs significantly from traditional Java code. It suffers from the same problem -- its code patterns are not optimized as well as the ones originating from Java code. JVM developers need to be aware of the differences between Java and Scala code, so that both types of code can be executed with optimal performance. This paper presents a detailed investigation of the performance impact of a large number of optimizations on the Scala DaCapo and the Java DaCapo benchmark suites. It describes the optimization techniques and analyzes the differences between traditional Java applications and Scala applications. The results help compiler engineers in understanding the characteristics of Scala. We performed these experiments on the work-in-progress Graal compiler. Graal is a new dynamic compiler for the HotSpot VM which aims to work well for a diverse set of workloads, including languages other than Java.
Min Cut Results from 2013
Randomized Decomposition results against state-of-the-art packages and best known solutions in 2013
Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications
Fano's inequality lower bounds the probability of transmission error through a communication channel. Applied to classification problems, it provides a lower bound on the Bayes error rate and motivates the widely used Infomax principle. In modern machine learning, we are often interested in more than just the error rate. In medical diagnosis, different errors incur different cost; hence, the overall risk is cost-sensitive. Two other popular criteria are balanced error rate (BER) and F-score. In this work, we focus on the two-class problem and use a general definition of conditional entropy (including Shannon's as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano's result. As a consequence, we show that Infomax is not suitable for optimizing F-score or cost-sensitive risk, in that it can potentially lead to low F-score and high risk. For cost-sensitive risk, we propose a new conditional entropy formulation which avoids this inconsistency. In addition, we consider the common practice of using a threshold on the posterior probability to tune performance of a classifier. As is widely known, a threshold of 0.5, where the posteriors cross, minimizes error rate---we derive similar optimal thresholds for F-score and BER.
Static Analysis by Elimination
In this paper we describe a program analysis technique for finding value ranges for variables in the LLVM compiler infrastructure. Range analysis has several important applications for embedded systems, including elimination of assertions in programs, automatically deducing numerical stability, eliminating array bounds checking, and integer overflow detection. Determining value ranges poses a major challenge in program analysis because it is difficult to ensure the termination and precision of the program analysis in the presence of program cycles. This work uses a technique where loops are detected intrinsically within the program analysis. Our work combines methods of elimination-based data flow analysis with abstract interpretation. We have implemented a prototype of the proposed framework in the LLVM compiler framework and have conducted experiments with a suite of test programs to show the feasibility of our approach.
Graal IR: An Extensible Declarative Intermediate Representation
We present an intermediate representation (IR) for a Java just in time (JIT) compiler written in Java. It is a graph-based IR that models both control-flow and data-flow dependencies between nodes. We show the framework in which we developed our IR. Much care has been taken to allow the programmer to focus on compiler optimization rather than IR bookkeeping. Edges between nodes are declared concisely using Java annotations, and common properties and functions on nodes are communicated to the framework by implementing interfaces. Building upon these declarations, the graph framework automatically implements a set of useful primitives that the programmer can use to implement optimizations.
Reliable peer-to-peer connections
Embodiments of a system and method for establishing reliable connections between peers in a peer-to-peer networking environment. In one embodiment, a reliable communications channel may use transmit and receive windows, acknowledgement of received messages, and retransmission of messages not received to provide reliable delivery of messages between peers in the peer-to-peer environment. In one embodiment, each message may include a sequence number configured for use in maintaining ordering of received messages on a receiving peer. A communications channel may make multiple hops on a network, and different hops in the connection may use different underlying network protocols. Communications channels may also pass through one or more firewalls and/or one or more gateways on the network. A communications channel may also pass through one or more router (relay) peers on the network. The peers may adjust the sizes of the transmit and receive window based upon reliability of the connection.
The Potential to Coordinate Digital Simulations for UK-wide VET:Report to the Commission on Adult Vocational Teaching and Learning
The report provides an insight and analysis of the opportunity and potential of simulation tools for Education.In VET learning and assessment are primarily practice-based. Consequently many colleges build simulations of real world locations, such as kitchens, hairdressing salons, garages, building sites, and farms in land-based colleges. A wide-range of digital tools are then used to support, amplify or augment these real-world learning processes, to prepare learners for authentic workplace practice, aid reflection on practice, reinforce their practice-based learning, and to help with revision before assessments. We examine here the particular role of digital simulation technologies alongside other digital applications and conventional methods.
Communities of Practice
Communities of practice was first adopted in education as a theory of learning, and by business, particularly within organizational development, as a knowledge management approach. This chapter reviews the literature on communities of practice. It is organized into five sections which call out the major areas in which communities of practice has had an influence. Each section provides an overview of the literature on communities of practice for the following domains: communities of practice definitions and theory, identities and belonging, learning and teaching methods using the theory of communities of practice, workplace communities of practice, and virtual communities of practice. The chapter finally addresses communities of scientific practice through the use of a case study.
SIMMAT: A Metastability Analysis Tool
Presentation at the IEEE/ACM ICCAD 2012 Workshop on CAD for Multi-Synchronous and Asynchronous Circuits and Systems, 8 November 2012, Hilton San Jose, CA USA.
Compilation Queuing and Graph Caching for Dynamic Compilers
Modern virtual machines for Java use a dynamic compiler to optimize the program at run time. The compilation time therefore impacts the performance of the application in two ways: First, the compilation and the program's execution compete for CPU resources. Second, the sooner the compilation of a method finishes, the sooner the method will execute faster. In this paper, we present two strategies for mitigating the performance impact of a dynamic compiler. We introduce and evaluate a way to cache, reuse and, at the right time, evict the compiler's intermediate graph representation. This allows reuse of this graph when a method is inlined multiple times into other methods. We show that the combination of late inlining and graph caching is highly effective by evaluating the cache hit rate for several benchmarks. Additionally, we present a new mechanism for optimizing the order in which methods get compiled. We use a priority queue in order to make sure that the compiler processes the hottest methods of the program first. The machine code for hot methods is available earlier, which has a significant impact on the first benchmark. Our results show that our techniques can significantly improve the start up performance of Java applications. The techniques are applicable to dynamic compilers for managed languages.
Intransitive noninterference in nondeterministic systems
This paper addresses the question of how TA-security, a semantics for intransitive information-flow policies in deterministic systems, can be generalized to nondeterministic systems. Various definitions are proposed, including definitions that state that the system enforces as much of the policy as possible in the context of attacks in which groups of agents collude by sharing information through channels that lie outside the system. Relationships between the various definitions proposed are characterized, and an unwinding-based proof technique is developed. Finally, it is shown that on a specific class of systems, access control systems with local non-determinism, the strongest definition can be verified by checking a simple static property.
RSSolver: A tool for solving large non-linear, non- convex discrete optimization problems
Describes the initial implementation of Randomized Decomposition (then called Randomized Search) with some numerical results on the price optimization and shelf space optimization problems
R2RML:RDB to RDF Mapping Language, W3 recommendation
Co-editor of the W3C recommendation on describing a language to map relational databases to RDF datasets.
DFScala: high level dataflow support for Scala.
In this paper we present DFScala, a library for constructing and executing dataflow graphs in the Scala language. Through the use of Scala this library allows the programmer to construct coarse grained dataflow graphs that take advantage of functional semantics for the dataflow graph and both functional and imperative semantics within the dataflow nodes. This combination allows for very clean code which exhibits the properties of dataflow programs, but we believe is more accessible to imperative programmers. We first describe DFScala in detail, before using a number of benchmarks to evaluate both its scalability and its absolute performance relative to existing codes. DFScala has been constructed as part of the Teraflux project and is being used extensively as a basis for further research into dataflow programming.
Low-loss Low-crosstalk Silicon Rib Waveguide Crossing with Tapered Multimode-Interference Design
Abstract: We report the design and fabrication of silicon rib-waveguide crossings based on taper-integrated multimode-interference. Measured devices built in a 130nm SOI CMOS process showed an insertion loss of 0.1dB/crossing, and an extracted crosstalk below -35dB.
A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Methods that measure compatibility between mention pairs are currently the dominant ap- proach to coreference. However, they suffer from a number of drawbacks including diffi- culties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable al- ternative is becoming urgent. In this paper we propose a novel discriminative hierarchical model that recursively partitions entities into trees of latent sub-entities. These trees suc- cinctly summarize the mentions providing a highly compact, information-rich structure for reasoning about entities and coreference un- certainty at massive scales. We demonstrate that the hierarchical model is several orders of magnitude faster than pairwise, allowing us to perform coreference on six million author mentions in under four hours on a single CPU.
MCMCMC: Efficient Inference by Approximate Sampling
Conditional random fields and other graphical models have achieved state of the art results in a variety of NLP and IE tasks including coref- erence and relation extraction. Increasingly, practitioners are using models with more com- plex structure—higher tree-width, larger fan- out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we pro- pose an alternative MCMC sampling scheme in which transition probabilities are approx- imated by sampling from the set of relevant factors. We demonstrate that our method con- verges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 mil- lion mentions, we achieve a 13 times speedup over regular MCMC inference.
Evaluating the Design of the R Language - Objects and Functions for Data Analysis
R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
Evaluating the Design of the R Language - Objects and Functions for Data Analysis.
R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
Informative Priors for Markov Blanket Discovery
We present a novel interpretation of information theoretic feature selection as optimization of a discriminative model. We show that this formulation coincides with a group of mutual information based filter heuristics in the literature, and show how our probabilistic framework gives a well-founded extension for informative priors. We then derive a particular sparsity prior that recovers the well-known IAMB algorithm (Tsamardinos & Aliferis, 2003) and extend it to create a novel algorithm, IAMB-IP, that includes domain knowledge priors. In empirical evaluations, we find the new algorithm to improve Markov Blanket recovery even when a misspecified prior was used, in which half the prior knowledge was incorrect.
Solving retail space optimization problem using the randomized search algorithm
Solving the Retail Space Optimization Problem using the Randomized Search Algorithm. An application of RD to shelf-space optimization
Resource-bounded Information Acquisition and Learning
In many scenarios it is desirable to augment existing data with information ac- quired from an external source. For example, information from the Web can be used to ll missing values in a database or to correct errors. In many machine learning and data mining scenarios, acquiring additional feature values can lead to improved data quality and accuracy. However, there is often a cost associated with such information acquisition, and we typically need to operate under limited resources. In this thesis, I explore dierent aspects of Resource-bounded Information Acquisition and Learning. The process of acquiring information from an external source involves multiple steps, such as deciding what subset of information to obtain, locating the documents that contain the required information, acquiring relevant documents, extracting the specic piece of information, and combining it with existing information to make useful decisions. The problem of Resource-bounded Information Acquisition (RBIA) viiinvolves saving resources at each stage of the information acquisition process. I ex- plore four special cases of the RBIA problem, propose general principles for eciently acquiring external information in real-world domains, and demonstrate their eective- ness using extensive experiments. For example, in some of these domains I show how interdependency between elds or records in the data can also be exploited to achieve cost reduction. Finally, I propose a general framework for RBIA, that takes into account the state of the database at each point of time, dynamically adapts to the re- sults of all the steps in the acquisition process so far, as well as the properties of each step, and carries them out striving to acquire most information with least amount of resources.
A case for exiting a transaction in the context of hardware transactional memory.
Despite the rapid growth in the area of Transactional Memory (TM), there is a lack of standardisation of certain features. The behaviour of a transactional abort is one such feature. All hardware TM and most software TM designs treat abort as a way of restarting the current transaction. However an alternative representation for the same functionality has been expressed in some software transactional memories and programming languages proposals. These allow the termination of a transaction without restarting. In this paper we argue that similar functionality is required for hardware TM as well. We call this functionality Exit Transaction, in which a programmer can explicitly ask the underlying TM system to move to the end of the transaction without committing it. We discuss how to extend a hardware TM system to support such a feature and our evaluation with two hardware TM systems shows that by using this functionality a speedup of up to 1.35X can be achieved on the benchmarks tested. This is achieved as a result of lower contention for resources and less false positives.
“Dual-Purpose” Remateable Conductive Ball-in-Pit Interconnects for Chip Powering and Passive Alignment in Proximity Communication Enabled Multi-Chip Packages
with Hiren Thacker, Ivan Shubin, Ying Luo, Kannan Raj, Ashok Krishnamoorthy and John Cunningham
Yes, There is an "Expertise Gap" in HPC Applications Development
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an "expertise gap" is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an "expertise gap" does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the "expertise gap" appears not to be amenable to previously proposed solutions such as "more education" and "more people." A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.
Selecting Actions for Resource-bounded Information Extraction using Reinforcement Learning
Given a database with missing or uncertain content, our goal is to correct and ll the database by extracting specic in- formation from a large corpus such as the Web, and to do so under resource limitations. We formulate the informa- tion gathering task as a series of choices among alternative, resource-consuming actions and use reinforcement learning to select the best action at each time step. We use tempo- ral dierence q-learning method to train the function that selects these actions, and compare it to an online, error- driven algorithm called SampleRank. We present a system that nds information such as email, job title and depart- ment aliation for the faculty at our university, and show that the learning-based approach accomplishes this task e- ciently under a limited action budget. Our evaluations show that we can obtain 92.4% of the nal F1, by only using 14.3% of all possible actions.
Grating-Coupler Based Low-Loss Optical Interlayer Coupling
IEEE Group IV Photonics
Applying dataflow and transactions to Lee routing.
Programming multicore shared-memory systems is a challenging combination of exposing parallelism in your program and communicating between the resulting parallel paths of execution. The burden of communication can introduce complexity that is hard to separate from the pure expression of the algorithm and can negate the performance that is gained from parallelism. We are extending the Scala language with dataow for creating parallelism and transactions for the controlled mutation of shared state. We take an early look at applying this work to Lee's algorithm for routing circuit boards and consider the potential bene_ts of programming with this system with regard to the elegance of expression and the resulting performance.We show how our approach reduces the number of lines of code and synchronisation operations needed, at the same time as improving real-world performance.
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.
10Gbps, 530 fJ/b Optical Transceiver Circuits in 40nm CMOS
IEEE Symp. VLSI Circ.
Integration and packaging of a macrochip with silicon nanophotonic links
in press, IEEE Journal of Selected Topics in Quantum Electronics, special issue on Packaging and Integration technologies for Optical MEMS/NEMS, Optoelectronic and Nanophotonic Devices, 2011.
Learning to Select Actions for Resource-bounded Information Extraction
Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes this task very efficiently under a limited action budget, obtaining approximately 90% of the overall F1 using less than 2% of actions. If we apply our method to the task of filling missing values in a large scale database with millions of rows and a large number of columns, the system can obtain just the required information from the Web very efficiently.
Architecture of the JInterval library
This is a translation from Russian of the paper for the conference "Statistics, Simulation, Optimization - 2011" to be held in Chelyabinsk, Russia. The JInterval library is an interval arithmetic library for Java. It was developed in collaboration with the Altai University, Barnaul, Russia. This paper presents the key architectural decisions made when designing JInterval library. It discusses compliance with functional requirements of the library as well as the current status of JInterval.
Learning to Select Actions for Resource-bounded Information Extraction
Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes this task very efficiently under a limited action budget, obtaining approximately 90% of the overall F1 using less than 2% of actions. If we apply our method to the task of filling missing values in a large scale database with millions of rows and a large number of columns, the system can obtain just the required information from the Web very efficiently.
25Gb/s 1V-driving CMOS ring modulator with integrated thermal tuning
We report a high-speed ring modulator that fits many of the ideal qualities for optical interconnect in future exascale supercomputers. The device was fabricated in a 130nm SOI CMOS process, with 7.5m ring radius. Its high-speed section, employing PN junction that works at carrier-depletion mode, enables 25Gb/s modulation and an extinction ratio >5dB with only 1V peak-to-peak driving. Its thermal tuning section allows the device to work in broad wavelength range, with a tuning efficiency of 0.19nm/mW. Based on microwave characterization and circuit modeling, the modulation energy is estimated ~7fJ/bit. The whole device fits in a compact 400um2 footprint.
Relating similar terms for information retrieval
A resource analyzer selects a resource (eg, document) from a grouping of resources. The grouping of resources can be any type of social tagging system used for information retrieval. The selected resource has an assigned uncontrolled tag and an assigned controlled tag. The controlled tag is a term derived from a controlled vocabulary of terms. Having selected the resource for analyzing, the resource analyzer identifies a first set of resources in the grouping of resources having also been assigned a same value as the uncontrolled tag as the selected resource. Similarly, the resource analyzer identifies a second set of resources in the grouping of resources having also been assigned a same value as the controlled tag. With this information, the resource analyzer then produces a comparison result indicative of a similarity between the first set of resources and the second set of resources.
System Considerations for Capacitive Chip-to-Chip Signaling
This paper is a submission to the IEEE Radio Frequency Integration Technology (RFIT) conference, http://www.ieee-rfit.org. This is an invited submission to be presented at a special session on "Wireless Replacement of Wireline I/O." Conference Date: Nov 30 - Dec 2, 2011; Location: Beijing, China.
The SOM Family: Virtual Machines for Teaching and Research
The talk gives an overview of the development of a family of Smalltalk virtual machine implementations called SOM (Simple Object Machine). The SOM VM, originating from the University of Aarhus, Denmark, has been ported to several programming languages, exploring different objectives. All of the VM implementations focus on providing an easily accessible workbench for teaching, but have also turned out to be a viable research platform. In this talk, each of the SOM VMs will be briefly described along with the results that were achieved in applying it in teaching at both undergraduate and graduate levels as well as research.
Simple Low-Jitter Scheduler
To appear at High Performance Switching and Routing Conference (HPSR), Cartagena, Spain
Query-Aware MCMC
Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s inter- ests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models.
SampleRank: Training Factor Graphs with Atomic Gradients
We present SampleRank, an alternative to con- trastive divergence (CD) for estimating param- eters in complex graphical models. SampleR- ank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. Sam- pleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23% error reduction on noun-phrase coreference).
A Network Architecture for the Web of Things
The "Web of Things" is emerging as an exciting vision for seamlessly integrating everyday objects like home appliances, digital picture frames, health monitoring devices and energy meters into the Internet using the Web's well-known stan- dards and blueprints. The key idea is to represent resources on these devices as URIs and use HTTP verbs (GET, PUT, POST, DELETE) as the uniform interface to manipulate them. Unfortunately, practical considerations such as band- width or energy constraints, rewalls/NATs and mobility pose interesting challenges in the realization of this ideal vi- sion. This paper describes these challenges, identies some potential solutions and presents the design and implemen- tation of a gateway-based network architecture to address these concerns. To the best of our knowledge, it represents the rst attempt within the Web of Things community to tackle these issues in a comprehensive manner.
Conversion of Decimal Strings to Floating-Point Numbers
Although floating-point operations are accurately-implemented in modern computers, the conversion of numbers from strings of decimal digits (text) to floating-point is difficult and often inaccurate. The difficulty of this conversion has even been used by hackers to attack weak points in systems. This report explores text-to-floating-point conversion and discusses possible performance improvements.
Exploiting CMOS Manufacturing to Reduce Tuning Requirements for Resonant Optical Devices
IEEE Photonics Journal
MUTS: native Scala constructs for software transactional memory.
In this paper we argue that the current approaches to implementing transactional memory in Scala, while very clean, adversely affect the programmability, readability and maintainability of transactional code. These problems occur out of a desire to avoid making modifications to the Scala compiler. As an alternative we introduce Manchester University Transactions for Scala (MUTS), which instead adds keywords to the Scala compiler to allow for the implementation of transactions through traditional block syntax such as that used in “while” statements. This allows for transactions that do not require a change of syntax style and do not restrict their granularity to whole classes or methods. While implementing MUTS does require some changes to the compiler’s parser, no further changes are required to the compiler. This is achieved by the parser describing the transactions in terms of existing constructs of the abstract syntax tree, and the use of Java Agents to rewrite to resulting class files once the compiler has completed. In addition to being an effective way of implementing transactional memory, this technique has the potential to be used as a light-weight way of adding support for additional Scala functionality to the Scala compiler.
An Evaluation of Asynchronous Stacks
We present an evaluation of some novel hardware implementations of a stack. All designs are asynchronous, fast, and energy efficient, while occupying modest area. We implemented a hybrid of two stack designs that can contain 42 data items with a family of GasP circuits. Measurements from the actual chip show that the chip functions correctly at speeds of up to 2.7 GHz in a 180 nm TSMC process at 2V. The energy consumption per stack operation depends on the number of data movements in the stack, which grows very slowly with the number of data items in the stack. We present a simple technique to measure separately the dynamic and static energy consumption of the complete chip as well as individual data movements in the stack. The average dynamic energy per move in the stack varies between 6pJ and 8pJ depending on the type of move.
Revisiting Condition Variables and Transactions
Prior condition synchronization primitives for memory transactions either force waiting transactions to abort (the retry construct), or force them to commit (also called punctuation in the literature). Although these primitives are useful in some settings, they do not enable programmers to conveniently express idioms that require synchronous communication (e.g., n-way rendezvous operations) between transactions. We present xCondition, a new form of condition variable that neither forces transactions to abort, nor to commit. Instead, an xCondition creates dependencies between the waiting and the corresponding notifying transactions such that the waiter can commit only if the corresponding notifier commits. If waiters and notifiers form dependency cycles (for instance, in synchronous communication idioms), they must commit or abort together. The xCondition construct builds on our earlier work on transaction communicators. We describe how to use xConditions in conjunction with communicators to enable effective coordination and communication between concurrent transactions. We illustrate the use of xConditions, and describe their implementation in the Maxine VM.
Progress in low-power switched optical interconnects
in press, IEEE Journal of Selected Topics in Quantum Electronics, special issue on Green Photonics, 2011.
High-efficiency 25Gb/s CMOS ring modulator with integrated thermal tuning
We report a 25Gb/s ring modulator with integrated thermal tuning fabricated in a 130nm CMOS process. With 2Vpp modulation, the optical eye shows >6dB extinction ratio. Modulation energy is estimated <24fJ/bit from circuit modeling.
Max Cut Results, 2011
Results for Max Cutset problem from 2011 compared to state-of-the-art in in 2011
Max Cut Results, 2011
Results for Max Cutset problem from 2011 compared to state-of-the-art in in 2011
A framework for reasoning about inherent parallelism in modern object-oriented languages
With the emergence of multi-core processors into the mainstream, parallel programming is no longer the specialized domain it once was. There is a growing need for systems to allow programmers to more easily reason about data dependencies and inherent parallelism in general purpose programs. Many of these programs are written in popular imperative programming languages like Java and C]. In this thesis I present a system for reasoning about side-effects of evaluation in an abstract and composable manner that is suitable for use by both programmers and automated tools such as compilers. The goal of developing such a system is to both facilitate the automatic exploitation of the inherent parallelism present in imperative programs and to allow programmers to reason about dependencies which may be limiting the parallelism available for exploitation in their applications. Previous work on languages and type systems for parallel computing has tended to focus on providing the programmer with tools to facilitate the manual parallelization of programs; programmers must decide when and where it is safe to employ parallelism without the assistance of the compiler or other automated tools. None of the existing systems combine abstraction and composition with parallelization and correctness checking to produce a framework which helps both programmers and automated tools to reason about inherent parallelism. In this work I present a system for abstractly reasoning about side-effects and data dependencies in modern, imperative, object-oriented languages using a type and effect system based on ideas from Ownership Types. I have developed sufficient conditions for the safe, automated detection and exploitation of a number task, data and loop parallelism patterns in terms of ownership relationships. To validate my work, I have applied my ideas to the C# version 3.0 language to produce a language extension called Zal. I have implemented a compiler for the Zal language as an extension of the GPC# research compiler as a proof of concept of my system. I have used it to parallelize a number of real-world applications to demonstrate the feasibility of my proposed approach. In addition to this empirical validation, I present an argument for the correctness of the type system and language semantics I have proposed as well as sketches of proofs for the correctness of the sufficient conditions for parallelization proposed.
A Novel MCM Package Enabling Proximity Communication I-O
A Novel MCM Package Enabling Proximity Communication I-O I. Shubin*, A. Chow, D. Popovic, H. Thacker, M. Giere+, R. Hopkins, A. V. Krishnamoorthy, J. G. Mitchell and J. E. Cunningham Oracle, 9515 Towne Centre Drive, San Diego, CA 92121, USA +currently with Hewlett-Packard, San Diego, CA *email: ivan.shubin@oracle.com phone: (858) 526-9032, fax: (858) 526-9176 t
Abstract A novel packaging approach is described that is based on micro-machined features integrated into CMOS chips. Our solution combines two key self-alignment mechanisms for the first time: solder reflow self-alignment and a novel micro-ball and pyramidal pit for passive self-alignment. We report on the demonstration of a MCM package with large footprint semiconductor CMOS chips interconnected by Proximity Communication (PxC), characterization of their high accuracy assembly process, and metrology of the resulting chip misalignment. Our goal is to develop a scalable, lead-free packaging approach by which large NxN PxC-enabled chip arrays are assembled with high precision on organic substrates in a cost effective manner while using industry standard parts and tooling.
Experimental studies of the Franz-Keldysh effect in CVD grown GeSi epi on SOI
Electroabsorption from GeSi on silicon-on-insulator (SOI) is expected to have promising potential for optical modulation due to its low power consumption, small footprint, and more importantly, wide spectral bandwidth for wavelength division multiplexing (WDM) applications. Germanium, as a bulk crystal, has a sharp absorption edge with a strong coefficient at the direct band gap close to the C-band wavelength. Unfortunately, when integrated onto Silicon, or when alloyed with dilute Si for blueshifting to the C-band operation, this strong Franz-Keldysh (FK) effect in bulk Ge is expected to degrade. Here, we report experimental results for GeSi epi when grown under a variety of conditions such as different Si alloy content, under selective versus non selective growth modes for both Silicon and SOI substrates. We compare the measured FK effect to the bulk Ge material. Reduced pressure CVD growth of GeSi heteroepitaxy with various Si content was studied by different characterization tools: X-ray diffraction (XRD), atomic force microscopy (AFM), secondary ion mass spectrometry (SIMS), Hall measurement and optical transmission/absorption to analyze performance for 1550 nm operation. State-of–the-art GeSi epi with low defect density and low root-mean-square (RMS) roughness were fabricated into pin diodes and tested in a surface-normal geometry. They exhibit low dark current density of 5 mA/cm2 at 1V reverse bias with breakdown voltages of 45 Volts. Strong electroabsorption was observed in our GeSi alloy with 0.6% Si content having maximum absorption contrast of ∆α/α ~5 at 1580 nm at 75 kV/cm.
Immersive Education Spaces using Open Wonderland From Pedagogy through to Practice
This chapter presents a case study of the use of virtual world environment in UK Higher Education. It reports on the activities carried out as part of the SIMiLLE (System for an Immersive and Mixed reality Language Learning) project to create a culturally sensitive virtual world to support language learning (funded by the UK government JISC program). The SIMiLLE project built on an earlier project called MiRTLE, which created a mixed-reality space for teaching and learning. The aim of the SIMiLLE project was to investigate the technical feasibility and pedagogical value of using virtual environments to provide a realistic socio-cultural setting for language learning interaction. The chapter begins by providing some background information on the Wonderland platform and the MiRTLE project, and then outlines the requirements for SIMiLLE, and how these requirements were supported through the use of a virtual world based on the Open Wonderland virtual world platform. The chapter then presents the framework used for the evaluation of the system, with a particular focus on the importance of incorporating pedagogy into the design of these systems, and how to support good practice with the ever-growing use of 3D virtual environments in formalized education. Finally, the results from the formative and summative evaluations are summarized, and the lessons learnt are presented, which can help inform future uses of immersive education spaces within Higher Education.
+SPACES: Serious Games for Role-Playing Government Policies
The paper explores how role-play simulations can be used to support policy discussion and refinement in virtual worlds. Although the work described is set primarily within the context of policy formulation for government, the lessons learnt are applicable to online learning and collaboration within virtual environments. The paper describes how the +Spaces project is using both 2D and 3D virtual spaces to engage with citizens to explore issues relevant to new government policies. It also focuses on the most challenging part of the project, which is to provide environments that can simulate some of the complexities of real life. Some examples of different approaches to simulation in virtual spaces are provided and the issues associated with them are further examined. We conclude that the use of role-play simulations seem to offer the most benefits in terms of providing a generalizable framework for citizens to engage with real issues arising from future policy decisions. Role-plays have also been shown to be a useful tool for engaging learners in the complexities of real-world issues, often generating insights which would not be possible using more conventional techniques.
Using virtual worlds for online role-play
The paper explores the use of virtual worlds to support online role-play as a collaborative activity. This paper describes some of the challenges involved in building online role-play environments in a virtual world and presents some of the ideas being explored by the project in the role-play applications being developed. Finally we explore how this can be used within the context of immersive education and 3D collaborative environments.
A Power-Efficient Network On-Chip Topology
International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, New York, NY, 2011
A sub-picojoule-per-bit CMOS silicon photonic receiver
Laser Focus World, 2010.
Efficient Coroutines for the Java Platform
Coroutines are non-preemptive light-weight processes. Their advantage over threads is that they do not have to be synchronized because they pass control to each other explicitly and deterministically. Coroutines are therefore an elegant and efficient implementation construct for numerous algorithmic problems. Many mainstream languages and runtime environments, however, do not provide a coroutine implementation. Even if they do, these implementations often have less than optimal performance characteristics because of the tradeoff between run time and memory efficiency. As more and more languages are implemented on top of the Java virtual machine (JVM), many of which provide coroutine-like language features, the need for a coroutine implementation has emerged.We present an implementation of coroutines in the JVM that efficiently handles a large range of workloads. It imposes no overhead for applications that do not use coroutines and performs well for applications that do. For evaluation purposes, we use our coroutines to implement JRuby fibers, which leads to a significant speedup of certain JRuby programs. We also present general benchmarks that show the performance of our approach and outline its run-time and memory characteristics.
Dynamic Code Evolution for Java
Dynamic code evolution is a technique to update a program while it is running. In an object-oriented language such as Java, this can be seen as replacing a set of classes by new versions. We modified an existing high-performance virtual machine to allow arbitrary changes to the definition of loaded classes. Besides adding and deleting fields and methods, we also allow any kind of changes to the class and interface hierarchy. Our approach focuses on increasing developer productivity during debugging. Changes can be applied at any point a Java program can be suspended. The evaluation section shows that our modifications to the virtual machine have no negative performance impact on normal program execution. The fast in-place instance update algorithm ensures that the performance characteristics of a change are comparable with performing a full garbage collection run. Standard Java development environments are capable of using the code evolution features of our modified virtual machine, so no additional tools are required.
Environmental considerations when measuring relative performance of graphics cards.
In this paper we examine some of the environmental conditions that have to be considered when comparing the performance of GPU’s to CPU’s. The range of these considerations varies greatly from the differing ages of the hardware used, to the effects of running the GPU code before the CPU code within the same binary. The latter of these has some quite surprising effects on the system as a whole. We then go on to test the different hardware performance at matrix multiplication using both their basic linear algebra libraries and hand coded functions. This is done while respecting the considerations we have described earlier in the paper, and addressing a problem that with the use of the Intel MKL library cannot be argued to be unfair to the CPU.
How good is a span of terms? Exploiting proximity to improve web retrieval
Ranking search results is a fundamental problem in information retrieval. In this paper we explore whether the use of proximity and phrase information can improve web retrieval accuracy. We build on existing research by incorporating novel ranking features based on flexible proximity terms with recent state-of-the-art machine learning ranking models. We introduce a method of determining the goodness of a set of proximity terms that takes advantage of the structured nature of web documents, document metadata, and phrasal information from search engine user query logs. We perform experiments on a large real-world Web data collection and show that using the goodness score of flexible proximity terms can improve ranking accuracy over state-ofthe-art ranking methods by as much as 13%. We also show that we can improve accuracy on the hardest queries by as much as 9% relative to state-of-the-art approaches.
Wafer-Testing of Optoelectronic-Gigascale CMOS Integrated Circuits
Gigascale integrated (GSI) chips with high-bandwidth, integrated optoelectronic (OE) and photonic components are an emerging technology. In this paper, we present the prospects and opportunities for wafer-testing of chips with electrical and optical I/O interconnects. The issues and requirements of testing OE-GSI chips during high-volume manufacturing are identified and discussed. Two probe substrate technologies based on microelectromechanical systems (MEMS) for simultaneously interfacing a multitude of surface-normal optical I/Os and high-density electrical I/Os are detailed. The first probe substrate comprises vertically compliant probes for contacting electrical I/Os and grating-in-waveguide optical probes for optical I/O coupling. The second MEMS probe module uses microsockets and through-substrate vias (TSVs) to contact pillar-shaped electrical and optical I/Os and to redistribute the signals, respectively.
Resource-bounded Information Extraction: Acquiring Missing Feature Values On Demand
We present a general framework for the task of extracting specic information \on demand" from a large corpus such as the Web under resource-constraints. Given a database with missing or uncertain information, the proposed system automatically formulates queries, is- sues them to a search interface, selects a subset of the documents, ex- tracts the required information from them, and lls the missing values in the original database. We also exploit inherent dependency within the data to obtain useful information with fewer computational resources. We build such a system in the citation database domain that extracts the missing publication years using limited resources from the Web. We discuss a probabilistic approach for this task and present rst results. The main contribution of this paper is to propose a general, comprehensive architecture for designing a system adaptable to dierent domains.
Silicon photonic network architectures for scalable, power-efficient multi-ship systems
Proceedings of the 37th ACM/IEEE International Symposium on Computer Architecture (ISCA), 2010.
Flip-chip integrated silicon photonic bridge chips for sub-picojoule per bit optical links
accepted and to appear at IEEE Electronics Components and Technology Conference (ECTC2010), June 2010.
Thesis: Debugging and Profiling of Transactional Programs
Transactional memory (TM) has become increasingly popular in recent years as a promising programming paradigm for writing correct and scalable concurrent programs. Despite its popularity, there has been very little work on how to debug and profile transactional programs. This dissertation addresses this situation by exploring the debugging and profiling needs of transactional programs, explaining how the tools should change to support these needs, and implementing preliminary infrastructure to support this change. Defense Date: Tuesday, March 23rd, 4pm Lubrano Conference Room, CIT Building, Brown University A few demos for profiling transactional programs using the T-PASS prototype
A Package Demonstration with Solder Free Compliant Flexible Interconnects.
I. Shubin*, A. Chow, J. Cunningham, M. Giere, N. Nettleton, N. Pinckney, J. Shi, J. Simons, D. Douglas Oracle, 9515 Towne Centre Drive, San Diego, CA USA 92121 E. M. Chow, D. Debruyker, B. Cheng, G. Anderson Palo Alto Research Center (PARC), 3333 Coyote Hill Road, Palo Alto, CA USA 94304 Flexible, stress-engineered spring interconnects is a novel technology potentially enabling room temperature assembly approaches to building highly integrated and multi-chip modules (MCMs). Such interconnects are an essential solder-free technology facilitating the MCM package diagnostics and rework. Previously, we demonstrated the performance, functionality, and reliability of compliant micro-spring interconnects under temperature cycling, humidity bias and high-current soak. Currently, we demonstrate for the first time the package with the 1st level conventional fine pitch C4 solder bump interconnects replaced by the arrays of microsprings. A dedicated CMOS integrated circuits (ICs) have been assembled onto substrates using these integrated microsprings. Metrology modules on the ICs are designed and used to characterize the connectivity and resistance of each micro-spring site.
Debugging applications at resource constrained virtual machines using dynamically installable lightweight agents
A system for debugging applications at resource-constrained virtual machines may include a target device configured to host a lightweight debug agent to obtain debug information from one or more threads of execution at a virtual machine executing at the target device, and a debug controller. The lightweight debug agent may include a plurality of independently deployable modules. The debug controller may be configured to select one or more of the modules for deployment at the virtual machine for a debug session initiated to debug a targeted thread, to deploy the selected modules at the virtual machine for the debug session, and to receive debug information related to the targeted thread from the lightweight debug agent during the session.
Ultra-low-energy all-CMOS modulator integrated with driver
Optics Express, Vol. 18, Number 3, 2010, pp. 3059-3070.
A Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi-Processors
International Workshop on Network on Chip Architectures (NoCArc'09), New York, NY, Dec 12, 2009
Circuits for silicon photonics on a 'macrochip'
Digest of Technical Papers, IEEE Asian Solid-State Circuits Conference (ASSCC2009), November 2009, pp. 17-20.
Lazy Continuations for Java Virtual Machines
Continuations, or 'the rest of the computation', are a concept that is most often used in the context of functional and dynamic programming languages. Implementations of such languages that work on top of the Java virtual machine (JVM) have traditionally been complicated by the lack of continuations because they must be simulated. We propose an implementation of continuations in the Java virtual machine with a lazy or on-demand approach. Our system imposes zero run-time overhead as long as no activations need to be saved and restored and performs well when continuations are used. Although our implementation can be used from Java code directly, it is mainly intended to facilitate the creation of frameworks that allow other functional or dynamic languages to be executed on a Java virtual machine. As there are no widely used benchmarks for continuation functionality on JVMs, we developed synthetical benchmarks that show the expected costs of the most important operations depending on various parameters.
Simple Fairness Protocols for Daisy Chain Interconnects
Symposium on High-Performance Interconnects (HotI'09), New York
Productive Petascale Computing: Requirements, Hardware, and Software
Supercomputer designers traditionally focus on low-level hardware performance criteria such as CPU cycle speed, disk bandwidth, and memory latency. The High-Performance Computing (HPC) community has more recently begun to realize that escalating hardware performance is, by itself, contributing less and less to real productivity—the ability to develop and deploy high-performance supercomputer applications at acceptable time and cost.
The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) initiative challenged industry vendors to design a new generation of supercomputers that would deliver a 10x improvement in this newly acknowledged but poorly understood domain of real productivity. Sun Microsystems, choosing to abandon customary evolutionary approaches, responded with two revolutionary decisions. The first was to investigate the nature of supercomputer productivity in the full context of use, which includes people, organizations, goals, practices, and skills as well as processors, disks, memory, and software. The second decision was to rethink completely the design of supercomputing systems, informed by productivity-based requirements and driven by recent technological breakthroughs. Crucial to the implementation of these decisions was the establishment of multidisciplinary, closely collaborating teams that conducted research into productivity and developed the many closely intertwined design decisions needed to meet DARPA’s challenge.
Among the most significant results from Sun’s productivity research was a detailed diagnosis of software development as the dominant barrier to productivity improvements in the HPC community. The level of expertise required, combined with the amount of effort needed to develop conventional HPC codes, has already created a crisis of productivity. Even worse, there is no path forward within the existing paradigm that will significantly increase productivity as hardware systems scale up. The same issues also prevent HPC from “scaling out” to a broader class of applications. This diagnosis led to design requirements that address specific issues behind the expertise and effort bottlenecks.
Sun’s design teams explored complex, system-wide tradeoffs needed to meet these requirements in all aspects of the design, including reliability, performance, programmability, and ease of administration. These tradeoffs drew on technological advances in massive chip multithreading, extremely high-performance interconnects, resource virtualization, and programming language design. The outcome was the design for a machine to operate at petascale, with extremely high reliability and a greatly simplified programming model. Although this design supports existing codes and software technologies—crucial requirements—it also anticipates that the greatest productivity breakthroughs will follow from dramatic changes in how HPC codes are developed, changes that require a system of the type designed by Sun’s HPCS team.
Sun Small Programmable Object Technology (Sun SPOTs) and Sensor.Network
Presentation and demo at the Sensor Web Enablement (SWE) working group meeting of the Open Geospatial Consortium (OGC), Cambridge, MA, Jun 23, 2009.
JavaOne Minute with Vipul Gupta
A video demonstrating Sensor.Network filmed live during JavaOne 2009, Jun, 2009.
Generating Transparent, Steerable Recommendations from Textual Descriptions of Items
We propose a recommendation technique that works by collecting text descriptions of the items that we want to recommend and then using this emph{textual aura} to compute the similarity between items using techniques drawn from information retrieval. We show how this representation can be used to explain the similarities between items using terms from the textual aura and further how it can be used to steer the recommender. We'll describe a system that demonstrates these techniques and we'll detail some preliminary experiments aimed at evaluating the quality of the recommendations and the effectiveness of the explanations of item similarity.
Hierarchical Filesystems Are Dead
For over forty years, we have assumed hierarchical file system namespaces. These namespaces were a rudimentary attempt at simple organization. As users have begun to interact with increasing amounts of data and are increasingly demanding search capability, such a simple hierarchical model has outlasted its usefulness. For this reason, we should design file systems whose organizations map to the ways we access and manipulate data now. We present a new file system architecture in which we replace the hierarchical namespace with a tagged, search-based one.
Experiments with a Solar-powered Sun SPOT
Sun SPOTs are small, battery-powered, wireless embedded devices that can autonomically sense and respond to their environment. These devices have the potential to revolutionize a broad spectrum of applications - environmental monitoring, asset tracking, proactive health care, intelligent agriculture, military surveillance, etc. Many of these require the device to run for long periods (months) using a combination of duty cycling and renewable energy sources (e.g., solar panels). This note describes lessons learned while collecting data from a solar-powered SPOT for a period of nearly four weeks.
An Exit Hole method for Verified Solution of IVPs for ODEs using Linear Programming for the Search of Tight Bounds
In his survey [5], Nedialkov stated that ?Although high-order Taylor series may be reasonably efficient for mildly stiff ODEs, we do not have an interval method suitable for stiff ODEs.? This paper is an attempt to find such a method, based on building a positively invariant set in extended state space. A positively invariant set is treated as geometric generalization of differential inequalities. We construct a positively invariant set from simpler sets which are not positively invariant, but have exit hole instead. The exit holes of simpler sets are suppressed during the construction. This paper considers only sets which are polytopes. Linear interval forms are used to evaluate a projection of ODE velocity vector to the normals of the polytope facets. This permits the use of Linear Programming for the search of tighter positively invariant set. The Exit Hole method is illustrated by stiff Van der Pol ODE.
MiRTLE: a mixed reality teaching & learning environment
This technical report describes a project to create a mixed reality teaching and learning environment using the virtual world toolkit Project Wonderland. The purpose of this document is to provide details about the background to the project, its goals and achievements. The intended audience for this document is educators, educational technologists, and others interested in the educational applications of virtual worlds.
Kinesis: A New Approach to Replica Placement in Distributed Storage Systems
Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failure-isolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availability), and scattered distribution (independent, pseudo-random spread of replicas in the system). These design principles enable storage systems to achieve balanced utilization of storage and network resources in the presence of incremental system expansions, failures of single and shared components, and skewed distributions of data size and popularity. In turn, this ability leads to significantly reduced resource provisioning costs, good user-perceived response times, and fast, parallelized recovery from independent and correlated failures. This article validates Kinesis through theoretical analysis, simulations, and experiments on a prototype implementation. Evaluations driven by real-world traces show that Kinesis can significantly outperform the widely used Chain replica-placement strategy in terms of resource requirements, end-to-end delay, and failure recovery.
Prediction-time Active Feature-value Acquisition for Cost-Effective Customer Targeting
In general, the prediction capability of classification models can be enhanced by acquiring additional relevant featuresfor instances. However, in many cases, there is a significant cost associated with this additional information— driving the need for an intelligent acquisition strategy. Motivated by real-world customer targeting domains, we consider the setting where a fixed set of additional features can be acquired for a subset of the instances at test time. We study different acquisition strategies of selecting instances for which to acquire more information, so as to obtain the most improvement in prediction performance per unit cost. We apply our methods to various targeting datasets and show that we can achieve a better prediction performance by actively acquiring features for only a small subset of instances, compared to a random-sampling baseline.
A hardware-assisted concurrent & parallel GC algorithm
Tutorial on the Maxwell algorithm (hardware assistance for concurrent and parallel GC) for an external audience. This is a draft for early release to academic collaborators.
A Mixed Reality Teaching and Learning Environment
This work in progress paper describes collaborative research, taking place on three continents, towards creating a 'mixed reality teaching & learning environment' (MiRTLE) that enables teachers and students participating in real-time mixed and online classes to interact with avatar representations of each other. The longer term hypothesis that will be investigated is that avatar representations of teachers and students will help create a sense of shared presence, engendering a sense of community and improving student engagement in online lessons. This paper explores the technology that will underpin such systems by presenting work on the use of a massively multi-user game server, based on Sun's Project Darkstar and Project Wonderland tools, to create a shared teaching environment, illustrating the process by describing the creation of a virtual classroom. We describe the Shanghai NEC eLearning system that will form the platform for the deployment of this work. As these systems will take on an increasingly global reach, we discuss how cross cultural issues will effect such systems. We conclude by outlining our future plans to test our hypothesis by deploying this technology on a live system with some 15,000 online users.
Introducing EclipseLink
The Eclipse Persistence Services Project, more commonly known as EclipseLink, is a comprehensive open source persistence solution. EclipseLink was started by a donation of the full source code and test suites of Oracle's TopLink product. This project brings the experience of over 12 years of commercial usage and feature development to the entire Java community. This evolution into an open source project is now complete and developers will soon have access to the EclipseLink 1.0 release.
The Energy Cost of SSL in Deeply Embedded Systems
As the number of potential applications for tiny, battery-powered, "mote"-like, deeply embedded devices grows, so does the need to simplify and secure interactions with such devices. Embedding a secure web server (capable of HTTP over SSL, aka HTTPS), enables these devices to be monitored and controlled securely via a user-friendly, browser-based interface.
This paper presents the first empirical energy analysis of the Internet's dominant security protocol, SSL, on highly constrained devices. We have enhanced Sizzle, our tiny-footprint HTTPS stack, with energy conserving features and measured its performance on a Telos mote. We show that the key exchange phase, which consumes much more energy than bulk encryption and authentication, amortizes well over the transmission of a few kilobytes of application data. Such amortization is easily attained with features like session reuse and persistent HTTP(S), both of which are supported by Sizzle. The extra energy cost of encrypting and authenticating application data with SSL is around 15%. With the addition of an application-level, duty-cycle based approach to low-power listening for incoming service requests, a pair of alkaline batteries can power Sizzle for over a year under a variety of application scenarios.
Flow Control in Output Buffered Switch with Input Groups
High Performance Switching and Routing Conference (HPSR'08), Shanghai, China
Using Ontologies and Vocabularies for Dynamic Linking
Ontology-based linking offers a solution to some of the problems with static, restricted, and inflexible traditional Web linking. Conceptual hypermedia provides navigation between Web resources, supported by a conceptual model, in which an ontology's definitions and structure, together with the lexical labels, drive the consistency of link provision and the linking's dynamic aspects. Lightweight standard representations make it possible to use existing vocabularies to support Web navigation and browsing. In this way, the navigation and linking of diverse resources (including those not in our control) based on a community understanding of the domain can be consistently managed.
Project Sun SPOT: A Java Technology-Enabled Platform for Ubiquitous Computing
Technical Session TS-6495, JavaOne, May 2008. [The Networking section starts at 18 min 57 sec and the Security section at 22 min 46 sec into the video.]
Validated method for IVPs for Ordinary Differential Equations based on Chaplygin's inequalities
Standard numerical methods for initial value problems (IVPs) for odinary differential equations (ODEs) return approximate solution only. Validated (also called interval) methods for IVPs for ODEs return approximate solution together with a rigorous enclosure of the true solution. A widely known validated method for IVPs for ODEs is the interval Hermite-Obreschkoff (IHO). This method meets the difficulties on stiff ODEs. The method of Chaplygin's inequalities is less known. However, it might be more suitable for problems like interval Spice simulator, because electrical circuits are described in Spice by stiff emprical ODEs which are not smooth enough. This memo describes IHO and Chaplygin validated methods and studies their stability on a simple ODE dy/dt=-y.
Sun Small Programmable Object Technology
Sun Labs Open House, Apr 2008.
This presentation makes extensive use of animations which were lost in the processing of converting to PDF. Watch the presentation video if you find the PDF slides confusing. The networking and security section starts roughly 33 min 15 sec into the video.Usable Security on Sun SPOTs
Lightning Talk, Java Mobile & Embedded Developer Days, Jan 23-24, 2008.
Dynamic Linking of Web Resources: Customisation and Personalisation
Conceptual Open Hypermedia Service (COHSE) provides a framework that integrates a knowledge service and the open hypermedia link service to dynamically link Web documents via knowledge resources (e.g., ontologies or controlled vocabularies). The Web can be considered as a closed hypermedia system — Links on the Web are unidirectional, embedded, difficult to author and maintain. With a Semantic Web architecture COHSE addresses these limitations by dynamically creating, multi-headed links on third party documents by integrating third party knowledge resources and third party services. Therefore open-ness is a key aspect of COHSE. This chapter first presents how COHSE architecture is reengineered to support customisation and to create an adaptable open hypermedia system where the user explicitly provides information about himself. It then presents how this architecture is deployed in a portal and discusses how this portal architecture can be extended to turn COHSE from an adaptable system to an adaptive system where system implicitly infers some information about the user.
Backlog Aware Low Complexity Schedulers for Input Queued Packet Switches
Symposium on High-Performance Interconnects (Hot Interconnects), Stanford University
Multiterabit Switch Fabrics Enabled by Proximity Communication
Symposium on High-Performance Chips (Hot Chips), Stanford University
Using horizontal displays for distributed and collocated agile planning
Computer-supported environments for agile project planning are often limited by the capability of the hardware to support collaborative work. We present DAP, a tool developed to aid distributed and collocated teams in agile planning meetings. Designed with a multi-client architecture, it works on standard desktop computers and digital tables. Using digital tables, DAP emulates index card based planning without requiring team members to be in the same room.
Using horizontal displays for distributed and collocated agile planning
Computer-supported environments for agile project planning are often limited by the capability of the hardware to support collaborative work. We present DAP, a tool developed to aid distributed and collocated teams in agile planning meetings. Designed with a multi-client architecture, it works on standard desktop computers and digital tables. Using digital tables, DAP emulates index card based planning without requiring team members to be in the same room.
PWWFA: Parallel Wave Front Arbiter for Large Switches
High Performance Switching and Routing Conference (HPSR'07), Brooklyn, New York
Introduction and evaluation of martlet, a scientific workflow language for abstracted parallelisation.
The workflow language Martlet described in this paper implements a new programming model that allows users to write parallel programs and analyse distributed data without having to be aware of the details of the parallelisation. Martlet abstracts the parallelisation of the computation and the splitting of the data through the inclusion of constructs inspired by functional programming. These allow programs to be written as an abstract description that can be adjusted automatically at runtime to match the data set and available resources. Using this model it is possible to write programs to perform complex calculations across a distributed data set such as Singular Value Decomposition or Least Squares problems, as well as creating an intuitive way of working with distributed systems Having described and evaluated Martlet against other functional languages for parallel computation, this paper goes on to look at how Martlet might develop. In doing so it covers both possible additions to the language itself, and the use of JIT compilers to increase the range of platforms it is capable of running on.
A platform for wireless networked transducers
As computers, sensors, and wireless communication have become smaller, cheaper, and more sophisticated, wireless transducer platforms have become a focus of research and commercial interest. This report describes an investigation into such platforms. It presents a new taxonomy of transducer systems, describes the construction of prototypes of a new transducer device designed for ease of application development, and discusses commercialization issues.
Balancing Security and Ease-of-Use on the Sun SPOTs
Sun Labs Open House, Apr, 2007.
Deleting Files in the Celeste Peer-to-Peer Storage System
Celeste is a robust peer-to-peer object store built on top of a distributed hash table (DHT). Celeste is a working system, developed by Sun Microsystems Laboratories. During the development of Celeste, we faced the challenge of complete object deletion, and moreover, of deleting "files" composed of several different objects. This important problem is not solved by merely deleting meta-data, as there are scenarios in which all file contents must be deleted, e.g., due to a court order. Complete file deletion in a realistic peer-to-peer storage system has not been previously dealt with due to the intricacy of the problem - the system may experience high churn rates, nodes may crash or have intermittent connectivity, and the overlay network may become partitioned at times. We present an algorithm that eventually deletes all file content, data and meta-data, in the aforementioned complex scenarios. The algorithm is fully functional and has been successfully integrated into Celeste.
Open Source and You
The real value of open-source software is the community it fosters.
Resource Partitioning in a Java Operating Environment
Managing the partitioning of resources between uncooperating applications is a fundamental requirement of an operating environment. Traditional operating environments only manage low-level resources which presents an impedance mismatch for internet-facing applications with service levels defined in terms of application-level transactions. The Multi-tasking Virtual Machine (MVM) and associated Resource Management API (RM) provide basic mechanisms for managing multiple applications within a Java operating environment. RM separates mechanism and policy and takes the unusual position of delegating rate-based management of resources to the policy level. This report describes the design and implementation of policies that provide flexible resource partitioning among applications and shows their effectiveness using microbenchmarks and an application level benchmark. The latter demonstrates the partitioning of an application-specific resource among a set of application instances using exactly the same policies as used for machine-level resources.
Personalised Dynamic Links on theWeb
Links on theWeb are unidirectional, embedded, difficult to author and maintain. With a Semantic Web architecture, COHSE (Conceptual Open Hypermedia System) aims to address these limitations by dynamically creating links on the Web. Here we present how this architecture is extended and modified to support customisation and create an adaptable open system by using third party ontologies and services to discover resources on the Web. We then present the deployment of this in a portal and discuss possible extensions to create an adaptive system to dynamically create personalised links.
Software Productivity Research In High Performance Computing
The challenge of utilizing supercomputers effectively at ever increasing scale is not being met, a phenomenon perceived within the high performance computing (HPC) community as a crisis of "productivity." Acknowledging that narrow focus on peak machine performance numbers has not served HPC goals well in the past, and acknowledging that the "productivity" of a computing system is not a well-understood phenomenon, the Defense Advanced Research Project Agency (DARPA) created the High Productivity Computing Systems (HPCS) program: Industry vendors were challenged to develop a new generation of supercomputers that are dramatically (10 times!) more productive, not just faster; and A community of vendor teams and non-vendor research institutions were challenged to develop an understanding of supercomputer productivity that will serve to guide future supercomputer development and to support productivity-based evaluation of computing systems. The HPCS Productivity Team at Sun Microsystems responded with two commitments: A community of vendor teams and non-vendor research institutions were challenged to develop an understanding of supercomputer productivity that will serve to guide future supercomputer development and to support productivity-based evaluation of computing systems. Put the investigation of these phenomena on the soundest scientific basis possible, drawing on well-established research methodologies from relevant fields, many of which are unfamiliar within the HPC community.
COHSE: dynamic linking of web resources
This document presents a description of the COHSE collaborative research project between Sun Microsystems Laboratories and the School of Computer Science at the University of Manchester, UK. The purpose of this document is to summarise the project in terms of the work completed and the results achieved. The focus of the project was an application to enable the dynamic creation of hypertext links between documents on a Web, thus the intended audience for this document comprises those members of academic and industrial research groups whose focus includes the Web in general and the Semantic Web and Hyper- text in particular.
Conscientious Software
Software needs to grow up and become responsible for itself and its own future by participating in its own installation and customization, maintaining its own health, and adapting itself to new circumstances, new users, and new uses. To create such software will require us to change some of our underlying assumptions about how we write programs. A promising approach seems to be to separate software that does the work (allopoietic) from software that keeps the system alive (autopoietic).
Programming the world with sun SPOTs
We describe the Sun1 Small Programmable Object Technology, or Sun SPOT. The Sun SPOT is a small wireless computing platform that runs Java1 directly, with no operating system. The system comes with an on-board set of sensors, and I/O pins for easy connection to external devices, and supporting software.
Introspection of a Java Virtual Machine under Simulation
Virtual machines are commonly used in commercially-significant systems, for example, Sun Microsystems' Java and Microsoft's .NET. The virtual machine offers many advantages to the system designer and administrator, but complicates the task of workload characterization: it presents an extra abstraction layer between the application and observed hardware effects. Understanding the behavior of the virtual machine is therefore important for all levels of the system architecture.
We have constructed a tool which examines the state of a Sun Java HotSpot virtual machine running inside Virtutech's Simics execution-driven simulator. We can obtain detailed information about the virtual machine and application without disturbing the state of the simulation. For data, we can answer such questions as: Is a given address in the heap? If so, in which object? Of what class? For code, we can map program counter values back to Java methods and approximate Java source line information. Our tool allows us to relate individual events in the simulation, for example, a cache miss, to the higher-level behavior of the application and virtual machine.
In this report, we present the design of our tool, including its capabilities and limitations, and demonstrate its application on the simulation's cache contents and cache misses.
Martlet: A scientific workflow language for abstracted parallelisation.
This paper describes a work-flow language ‘Martlet’ for the analysis of large quantities of distributed data. This work-flow language is fundamentally different to other languages as it implements a new programming model. Inspired by inductive constructs of functional programming this programming model allows it to abstract the complexities of data and processing distribution. This means the user is not required to have any knowledge of the underlying architecture or how to write distributed programs. As well as making distributed resources available to more people, this abstraction also reduces the potential for errors when writing distributed programs. While this abstraction places some restrictions on the user, it is descriptive enough to describe a large class of problems, including algorithms for solving Singular Value Decompositions and Least Squares problems. Currently this language runs on a stand-alone middleware. This middleware can however be adapted to run on top of a wide range of existing work-flow engines through the use of JIT compilers capable of producing other work-flow languages at run time. This makes this work applicable to a huge range of computing projects.
Enterprise Mobility
With the proliferation of Wireless technologies and business globalization, mobility of people and devices have become inevitable. The experience of mobile computing in different campuses or drop-in offices also faces challenges of starting up applications and tools, synchronizing filesystems, maintaining one's desktop environment or even finding a printer location or network services. In this document, we will discuss different types of mobility, issues with mobility, and why it is important to consider the mobility issues. Finally, this document discusses a network layer solution for IP mobility for continuous connectivity. It also sheds light on future directions of research on mobility that might be interesting for Sun Microsystems.
Data access and analysis with distributed federated data servers in climateprediction.net.
climateprediction.net is a large public resource distributed scientific computing project. Members of the public download and run a full-scale climate model, donate their computing time to a large perturbed physics ensemble experiment to forecast the climate in the 21st century and submit their results back to the project. The amount of data generated is large, consisting of tens of thousands of individual runs each in the order of tens of megabytes. The overall dataset is, therefore, in the order of terabytes. Access and analysis of the data is further complicated by the reliance on donated, distributed, federated data servers. This paper will discuss the problems encountered when the data required for even a simple analysis is spread across several servers and how webservice technology can be used; how different user interfaces with varying levels of complexity and flexibility can be presented to the application scientists, how using existing web technologies such as HTTP, SOAP, XML, HTML and CGI can engender the reuse of code across interfaces; and how application scientists can be notified of their analysis’ progress and results in an asynchronous architecture.
Knowledge-Driven Hyperlinks: Linking in the Wild
Since Ted Nelson coined the term “Hypertext”, there has been extensive research on non-linear documents. With the enormous success of the Web, non-linear documents have become an important part of our daily life activities. However, the underlying hypertext infrastructure of the Web still lacks many features that Hypertext pioneers envisioned. With advances in the Semantic Web, we can address and improve some of these limitations. In this paper, we discuss some of these limitations, developments in Semantic Web technologies and present a system – COHSE – that dynamically links Web pages. We conclude with remarks on future directions for semantics-based linking.
Elliptic Curve Cryptography (ECC) Cipher Suites for Transport Layer Security (TLS)
IETF RFC 4492, May. 2006.
Policy-based Management of a JDBC Connection Pool
Managing the communication between an application server and a back-end database is essential for scalability and crucial for good performance. The standard mechanism uses a variable-sized pool of connections, but typical application servers provide very rudimentary, implementation-centric, pool control mechanisms. This requires administrators to manually translate service level specifications into the pool control mechanism, and adjust these as the load or machine configurations change. We describe the use of a resource management framework to automatically control connection pool parameters based on externally supplied policies.This simplifies the connection pool implementation while at the same time allowing a variety of policies to be applied, including policies that automatically adapt to changing circumstances.
The implementation of two distinct policies are discussed and performance measurements are reported for a contemporary synthetic application benchmark.
Suite B Enablement in TLS: A Report on Interoperability Testing Between Sun, Red Hat and Microsoft
Invited presentation at NIST's 5th Annual PKI R&D Workshop, Apr 5, 2006 (co-presenters: Robert Relyea, Red Hat and Kelvin Yiu, Microsoft).
Scientific middleware for abstracted parallelisation.
In this paper we introduce a class of problems that arise when the analysis of data split into an unknown number of pieces is attempted. Such analysis falls under the definition of Grid computing, but fails to be addressed by the current Grid computing projects, as they do not provide the appropriate abstractions. We then describe a distributed web service based middleware platform, which solves these problems by supporting construction of parallel data analysis functions for datasets with an unknown level of distribution. This analysis is achieved through the combination of Martlet, a workflow language that uses constructs from functional programming to abstract the parallelisation in computations away from the user, and the construction of supporting middleware. To construct such a supporting middleware it is necessary to provide the capability to reason about the data structures held without restricting their nature. Issues covered in the development of this supporting middleware include the ability to handle distributed data transfer and management, function deployment and execution.
Writing Solaris Device Drivers in Java
We present an experimental implementation of the Java Virtual Machine that runs inside the kernel of the Solaris operating system. The implementation was done by porting an existing small, portable JVM, Squawk, into the Solaris kernel. Our first application of this system is to allow device drivers to be written in Java. A simple device driver was ported from C to Java. Characteristics of the Java device driver and our device driver interface are described.
Yes, There is an "Expertise Gap" in HPC Applications Development
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an "expertise gap" is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an "expertise gap" does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the "expertise gap" appears not to be amenable to previously proposed solutions such as "more education" and "more people." A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.
Yes, There is an "Expertise Gap" in HPC Applications Development
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an "expertise gap" is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an "expertise gap" does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the "expertise gap" appears not to be amenable to previously proposed solutions such as "more education" and "more people." A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.
Yes, There is an "Expertise Gap" in HPC Applications Development
Third Workshop on Productivity and Performance in High-End Computing (P-PHEC), 12 February 2006, Austin, Texas
Abstract:
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC), where productivity is understood to be a composite of system performance, system robustness, programmability, portability, and administrative concerns. Of these, programmability is the least well understood and perceived to be the most problematic. It has been suggested that an "expertise gap" is at the heart of the problem in HPC application development. Preliminary results from research conducted by Sun Microsystems and other participants in the HPCS program confirm that such an "expertise gap" does exist and does exert a significant confounding influence on HPC application development. Further, the nature of the "expertise gap" appears not to be amenable to previously proposed solutions such as "more education" and "more people." A productivity improvement of the scale sought by the HPCS program will require fundamental transformations in the way HPC applications are developed and maintained.
An Overview of the Singularity Project
Singularity is a research project in Microsoft Research that started with the question: what would a software platform look like if it was designed from scratch with the primary goal of dependability? Singularity is working to answer this question by building on advances in programming languages and tools to develop a new system architecture and operating system (named Singularity), with the aim of producing a more robust and dependable software platform. Singularity demonstrates the practicality of new technologies and architectural decisions, which should lead to the construction of more robust and dependable systems.
Sizzle: A Standards-based End-to-End Security Architecture for the Embedded Internet
According to popular perception, public-key cryptography is beyond the capabilities of highly constrained, "mote"-like, embedded devices. We show that elliptic curve cryptography not only makes public-key cryptography feasible on these devices, it allows one to create a complete secure web server stack that runs efficiently within very tight resource constraints. Our smallfootprint HTTPS stack, nick-named Sizzle, has been implemented on multiple generations of the Berkeley/Crossbow motes where it runs in less than 4KB of RAM, completes a full SSL handshake in 1 second (session reuse takes 0.5 seconds) and transfers 1 KB of application data over SSL in 0.4 seconds. Sizzle is the world's smallest secure web server and can be embedded inside home appliances, personal medical devices, etc., allowing them to be monitored and controlled remotely via a web browser without sacrificing end-to-end security.
This report is an extended version of a paper that received the 'Mark Weiser Best Paper Award' at the Third IEEE International Conference on Pervasive Computing and Communications (PerCom), Hawaii, March 2005.
Can Software Engineering Solve the HPCS Problem?
Second International Workshop on Software Engineering for High Performance Computing System Applications, St. Louis, Missouri, May 15, 2005
Abstract:
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity improvement. Software Engineering has addressed this goal in other domains and identified many important principles that, when aligned with hardware and computer science technologies, do make dramatic improvements in productivity. Do these principles work for the HPC domain?
This case study collects data on the potential benefits of perfective maintenance in which human productivity (programmability, readability, verifiability, maintainability) is paramount. An HPC professional rewrote four FORTRAN77/MPI benchmarks in Fortran 90, removing optimizations (many improving distributed memory performance) and emphasizing clarity.
The code shrank by 5-10x and is significantly easier to read and relate to specifications. Run time performance slowed by about 2x. More studies are needed to confirm that the resulting code is easy to maintain and that the lost performance can be recovered with compiler optimization technologies, run time management techniques and scalable shared memory hardware.
HPC Needs a Tool Strategy
Second International Workshop on Software Engineering for High Performance Computing System Applications, St. Louis, Missouri, May 15, 2005
Abstract:
The High Productivity Computing Systems (HPCS) program seeks a tenfold productivity increase in High Performance Computing (HPC). A change of this magnitude in software development and maintenance demands a transformation similar to other great leaps in industrial productivity. By analogy, this requires a dramatic change to the "infrastructure" and to the way software developers use it. Software tools such as compilers, libraries, debuggers and analyzers constitute an essential part of the HPC infrastructure, without which codes cannot be efficiently developed nor production runs accomplished.
The underappreciated "HPC software infrastructure" is not up to the task and is becoming less so in the face of increasing scale, complexity, and mission importance. Infrastructure dependencies are seen as significant risks to success, and significant productivity gains remain unrealized. Support models for this infrastructure are not aligned with its strategic value.
To achieve the potential of the software infrastructure, both for stability and for productivity breakthroughs, a dedicated, long-term, client-focused support structure must be established. Goals for tools in the infrastructure would include ubiquity, portability, and longevity commensurate with the projects they support, typically decades. The strategic value of such an infrastructure necessarily transcends individual projects, laboratories, and organizations.
Secure Adhoc Communication
Technical overview of the project.
Innovation Happens Elsewhere: Open Source as Business Strategy
It's a plain fact: regardless of how smart, creative, and innovative your organization is, there are more smart, creative, and innovative people outside your organization than inside. Open source offers the possibility of bringing more innovation into your business by building a creative community that reaches beyond the barriers of the business. The key is developing a web-driven community where new types of collaboration and creativity can flourish. Since 1998 Ron Goldman and Richard Gabriel have been helping groups at Sun Microsystems understand open source and advising them on how to build successful communities around open source projects. In this book the authors present lessons learned from their own experiences with open source, as well as those from other well-known projects such as Linux, Apache, and Mozilla.
Innovation Happens Elsewhere: Open Source as Business Strategy
It's a plain fact: regardless of how smart, creative, and innovative your organization is, there are more smart, creative, and innovative people outside your organization than inside. Open source offers the possibility of bringing more innovation into your business by building a creative community that reaches beyond the barriers of the business. The key is developing a web-driven community where new types of collaboration and creativity can flourish. Since 1998 Ron Goldman and Richard Gabriel have been helping groups at Sun Microsystems understand open source and advising them on how to build successful communities around open source projects. In this book the authors present lessons learned from their own experiences with open source, as well as those from other well-known projects such as Linux, Apache, and Mozilla.
Security Issues in Wireless Sensor Networks
Invited presentation at the 10th FBI Information Technology Study Group Workshop, Apr 21, 2005.
A Cryptographic Processor for Arbitrary Elliptic Curves over GF(2^m )
International Journal of Embedded Systems, Feb. 2005. Extended version of the paper that won the Best Paper award at IEEE ASAP 2003.
Object-aware memory architecture, An
Despite its dominance, object-oriented computation has received scant attention from the architecture community. We propose a novel memory architecture that supports objects and garbage collection (GC). Our architecture is co-designed with a Java Virtual Machine to improve the functionality and efficiency of heap memory management. The architecture is based on an address space for objects accessed using object IDs mapped by a translator to physical addresses. To support this, the system includes object-addressed caches, a hardware GC barrier to allow in-cache GC of objects, and an exposed cache structure cooperatively managed by the JVM. These extend a conventional architecture, without compromising compatibility or performance for legacy binaries.
Our innovations enable various improvements such as: a novel technique for parallel and concurrent garbage collection, without requiring any global synchronization; an in-cache garbage collector, which never accesses main memory; concurrent compaction of objects; and elimination of most GC store barrier overhead. We compare the behavior of our system against that of a conventional generational garbage collector, both with and without an explicit allocate-incache operation. Explicit allocation eliminates many write misses; our scheme additionally trades L2 misses for in-cache operations, and provides the mapping indirection required for concurrent compaction.
The use of capability descriptions in a wireless transducer network
This document presents the requirements for a language to describe the capabilities of a transducer in a wireless transducer network (WTN). It provides a survey of existing technologies in this field and concludes with a framework in which the capabilities of a transducer can be employed to assist users in the configuration of a WTN. The intended audience for this paper comprises members of academic and industrial research groups whose focus is networked devices, such as those used in wireless sensor networks.
Sizzle -- SSL on Motes
Invited presentation at U.C. Berkeley's CENTS Retreat, Tahoe, Jan. 2005.
Experiments in Wireless Internet Security
in Statistical Methods in Computer Security, William W. S. Chen, (Editor), Dekker/CRC Press, pp. 33-47.
Partitioning of Code for a Massively Parallel Machine
Code partitioning is the problem of dividing sections of code among a set of processors for execution in parallel taking into account the communication overhead between the processors. Code partitioning of large amounts of code onto numerous processors requires variations to the classical partitioning algorithms, in part due to the memory and time requirements to partition a large set of data, but also due to the nature of the target machine and multiple constraints imposed by its architectural features.
In this paper, we present our experience in the design of enhancements to the classical multi-level k-way partitioning algorithm to deal with large graphs of over 1 million nodes, 5 constraints, and nodes of irregular size. Our algorithm was implemented to produce code for a massively parallel machine of up to 40,000 processors, and forms part of a hardware description language compiler. The algorithm and the compiler were tested on RTL designs for a next generation SPARC® processor. We present performance results and comparisons for partitioning multi-processor hardware designs.
Garbage-first garbage collection
Garbage-First is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. Whole-heap operations, such as global mark- ing, are performed concurrently with mutation, to prevent interruptions proportional to heap or live-data size. Concur- rent marking both provides collection ”completeness” and identifies regions ripe for reclamation via compacting evac- uation. This evacuation is performed in parallel on multi- processors, to increase throughput.
Comparative Study of Persistence Mechanisms for the Java™ Platform, A
Access to persistent data is a requirement for the majority of computer applications. The Java programming language and associated run-time environment provide excellent features for the construction of reliable and robust applications, but currently these do not extend to the domain of persistent data. Many mechanisms for managing persistent data have been proposed, some of which are now included in the standard Java platforms, e.g., J2SE™ and J2EE™.
This paper defines a set of criteria by which persistence mechanisms may be compared and then applies the criteria to a representative set of widely used mechanisms. The criteria are evaluated in the context of a widely-known benchmark, which was ported to each of the mechanisms, and include performance and scalability results.
Maintaining Object Ordering in a Shared P2P Storage Environment
Modern peer-to-peer (P2P) storage systems have evolved to provide solutions to a variety of burning storage problems. While the first generation provided rather informal file sharing, more recent approaches provide more extensive security, sharing, and archive capabilities.
To be considered a viable storage solution the system must exhibit high availability and data persistence characteristics. In an attempt to provide these, most systems assume a continuously connected and available underlying communication infrastructure. But this is not necessarily the case because equipment failures, denial of service attacks, and just poor (yet common) corporate network design may cause discontinuities and interruptions in the communication service. Any proposed storage solution needs to address such issues transparently.
Storage archival systems can live with discontinuities, as long as the stored data can be uniquely identified. Continuous update systems that allow updating data by multiple writers have harder problems to overcome since the ordering of updates needs to be maintained independently of connectivity conditions. In this paper, we propose a solution for maintaining the ordering even under severe connectivity disruptions, allowing the system to continue functioning while connectivity is disrupted, and to recover from the disruption smoothly when connectivity is restored.
Accelerating Next-Generation Public-key Cryptography on General-Purpose CPUs
Hot Chips 16, Aug. 2004. Selected as one of the Best Papers.
Using gated experts in fault diagnosis and prognosis
Three individual experts have been developed based on extended auto associative neural networks (E-AANN), Kohonen self organizing maps (KSOM), and the radial basis function based clustering (RBFC) algorithms. An integrated method is proposed later to combine the set of individual experts managed by a gated experts algorithm, which assigns the experts based on their best performance regions. We have used a Matlab Simulink model of a chiller system and applied the individual experts and the integrated method to detect and recover sensor errors. It has been shown that the integrated method gets better performance in diagnostics and prognostics compared with each individual expert.
Grid style web services for climateprediction.net.
In this paper we describe a architecture which implements call and pass by reference using asynchronous Web Services. This architecture provides a distributed data analysis environment where functions can be dynamically described and used.
Scaling J2EE™ Application Servers with the Multi-Tasking Virtual Machine
The Java 2 Platform, Enterprise Edition (J2EE) is established as the standard platform for hosting enterprise applications written in the Java programming language. Similar to an operating system, a J2EE server can host multiple applications, but this is rarely seen in practice due to limitations on scalability, weak inter-application isolation and inadequate resource management facilities in the underlying Java platform. This leads to a proliferation of server instances, each typically hosting a single application, with a consequent dramatic increase in the total memory footprint and more complex system administration. The Multi-tasking Virtual Machine (MVM) solves this problem by providing an efficient and scalable implementation of the isolate API for multiple, isolated tasks, enabling the co-location of multiple server instances in a single MVM process. Isolates also enable the restructuring of a J2EE server implementation as a collection of isolated components, offering increased flexibility and reliability. The resulting system is a step towards a complete and scalable operating environment for enterprise applications.
Supernets and snHubs: A Foundation for Public Utility Computing
The notion of procuring computer services from a utility, much the way we get water and electricity and phone service, is not new. The idea at the center of the public utility trend in computer services is to allow firms to focus less on administering and supporting their information technology and more on running their business. Supernets and their implementation as hardware devices (snHubs) are our approach to make networks part of the public utility computing (PUC) infrastructure. The infrastructure is a key to integrating and enabling such "remote access" constituencies as B2B, out-sourcing vendors, and workers who telecommute in a safe and scalable manner. We have designed, developed, and deployed a prototype whose viability is now being demonstrated by a small deployment throughout Sun Microsystems.
Shedding Light on the Hidden Web
The terms Hidden Web, Deep Web and Invisible Web describe those resources on the Web that are in some way unreachable by search engines, and are potentially unusable to other Web systems such as annotation services. These hidden resources make up a signicant part of the current Web. We provide rm denitions of the ways in which information can be "hidden", and discuss the challenges that face those working with annotation in the Hidden Web. We do not attempt to provide solutions for these challenges, but a clarification of the terms involved is certainly a step in the right direction.
Design of JFluid: A Profiling Technology and Tool Based on Dynamic Bytecode Instrumentation
Instrumentation-based profiling has many advantages and one serious disadvantage: usually high performance overhead. This overhead can be substantially reduced if only a small part of the target application (for example, one that has previously been identified as a performance bottleneck) is instrumented, while the rest of the application code runs at full speed. Such an approach can also beat scalability issues caused by a high volume of profiling information generated by instrumented code running on behalf of multiple threads. The value of such a profiling technology would increase further if the code could be instrumented and de-instrumented as many times as needed at run time.
In this report we describe in detail the design of an experimental profiling system called JFluid, which includes a modified Java HotSpot™ VM and a GUI tool, and addresses both of the above issues. Our JVM™ supports arbitrary on-the-fly modifications to running Java methods, and can connect with a profiling tool at any moment, without any startup time preparation. Our tool collects, processes and presents profiling data on-line. To perform CPU profiling, it instruments a group of methods defined as an arbitrary "root" method plus all methods that it calls (a call subgraph). It appears that static determination of all methods in a call subgraph is difficult in the presence of virtual methods, but fortunately, with dynamic code hotswapping available, two schemes of dynamic call subgraph revelation and instrumentation can be suggested.
Measurements that we obtained when performing full and partial program profiling using both schemes show that the overhead can be reduced substantially using this technique, and that one of the schemes generally results in a smaller number of instrumented methods and better performance, especially for large applications.
Securing the Web with the Next Generation Public-Key Cryptosystem
Stanford Networking Research Center (SNRC) industry seminar
Inductive Learning for Fault Diagnosis
There is a steadily increasing need for autonomous systems that must be able to function with minimal human intervention to detect and isolate faults, and recover from such faults. In this paper we present a novel hybrid Model based and Data Clustering (MDC) architecture for fault monitoring and diagnosis, which is suitable for complex dynamic systems with continuous and discrete variables. The MDC approach allows for adaptation of both structure and parameters of identified models using supervised and reinforcement learning techniques. The MDC approach will be illustrated using the model and data from the Hybrid Combustion Facility (HCF) at the NASA Ames Research Center.
Co-evolutionary perception-based reinforcement learning for sensor allocation in autonomous vehicles
In this paper we study the problem of sensor allocation in Unmanned Aerial Vehicles (UAVs). Each UAV uses perception-based rules for generalizing decision strategy across similar states and reinforcement learning for adapting these rules to the uncertain, dynamic environment. A big challenge for reinforcement learning algorithms in this problem is that UAVs need to learn two complementary policies: how to allocate their individual sensors to appearing targets and how to distribute themselves as a team in space to match the density and importance of targets underneath. We address this problem using a co-evolutionary approach, where the policies are learned separately, but they use a common reward function. The applicability of our approach to the UAV domain is verified using a high-fidelity robotic simulator. Based on our results, we believe that the co-evolutionary reinforcement learning approach to reducing dimensionality of the action space presented in this paper is general enough to be applicable to many other multi-objective optimization problems, particularly those that involve a tradeoff between individual optimality and team-level optimality.
Securing the Web with Next Generation Cryptographic Technologies
Internetworking 2003, San Jose, Jun. 2003.
Cryptographic Processor for Arbitrary Elliptic Curves over GF(2^m), A
We describe a cryptographic processor for Elliptic Curve Cryptography (ECC). ECC is evolving as an attractive alternative to other public-key cryptosystems such as the Rivest-Shamir-Adleman algorithm (RSA) by offering the smallest key size and the highest strength per bit. The cryptographic processor performs point multiplication for elliptic curves over binary polynomial fields GF(2m). In contrast to other designs that only support one curve at a time, our processor is capable of handling arbitrary curves without requiring reconfiguration. More specifically, it can handle both named curves as standardized by the National Institute for Standards and Technology (NIST) as well as any other generic curves up to a field degree of 255. Efficient support for arbitrary curves is particularly important for the targeted server applications that need to handle requests for secure connections generated by a multitude of heterogeneous client devices. Such requests may specify curves which are infrequently used or not even known at implementation time.
We have implemented the cryptographic processor in a field-programmable gate array (FPGA) running at a clock frequency of 66.4 MHz. Its performance is 6955 point multiplications per second for named curves over GF(2163) and 3308 point multiplications per second for generic curves over GF(2163). We have integrated the cryptographic processor into the open source toolkit OpenSSL, which implements the Secure Sockets Layer (SSL) which is today's dominant Internet security protocol.
This report is an extended version of a paper presented at the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors, The Hague, June 2003 where it received the "Best Paper Award".
Project JXTA: A Loosely-Consistent DHT Rendezvous Walker
The open-source community Project JXTA defines an open set of standard protocols for ad hoc, pervasive, peer-to-peer (P2P) computing as a common platform for developing a wide variety of decentralized network applications. The following paper describes a loosely- consistent DHT walker approach for searching adver- tisements and routing queries in the JXTA rendezvous network. The loosely-consistent DHT walker uses an hybrid approach that combines the use of a DHT to index and locate contents, with a limited range walker to resolve inconsistency of the DHT within the dynamic rendezvous network. This proposed DHT approach does not require maintaining consistency across the rendezvous network, a stable super-peer infrastructure, and is well adapted to ad hoc P2P net- work with high peer churn rate.
Project JXTA: A Loosely-Consistent DHT Rendezvous Walker
The open-source community Project JXTA defines an open set of standard protocols for ad hoc, pervasive, peer-to-peer (P2P) computing as a common platform for developing a wide variety of decentralized network applications. The following paper describes a loosely- consistent DHT walker approach for searching adver- tisements and routing queries in the JXTA rendezvous network. The loosely-consistent DHT walker uses an hybrid approach that combines the use of a DHT to index and locate contents, with a limited range walker to resolve inconsistency of the DHT within the dynamic rendezvous network. This proposed DHT approach does not require maintaining consistency across the rendezvous network, a stable super-peer infrastructure, and is well adapted to ad hoc P2P net- work with high peer churn rate.
Towards a Java™-Based Enterprise Client for Small Devices
The goal of the work reported here was to explore the use of the Java 2 Micro Edition (J2ME™) platform for applications connected to the enterprise, specifically focusing on Palm-based wireless applications. We found that the Java™ platform on the Palm is still maturing. The Palm itself has been carefully engineered to support small native applications, with a distinctive graphical user interface tuned for its display. Work remains to be done on the Palm to support more complex wireless applications and to make Java-based applications competitive. We also found that wireless enterprise applications in general are somewhat problematic, due to issues of network reliability, availability, bandwidth, and provisioning. Significantly, programming languages and their platforms are not the gating factors to large scale wireless deployment.
This work was performed in 2000 and 2001, before the current commercial deployment of Java-enabled mobile devices and faster wide-area wireless data services (such as GPRS). We hope to repeat our experiments using these technologies.
Radioport: A Radio Network for Monitoring and Diagnosing Computer Systems
A radio network is described for configuring, monitoring, and diagnosing the components of a computer system. Such a network offers several advantages: (a) It improves the robustness of the overall system by not having the monitoring functions rely on the interconnect of the monitored system; (b) by broadcasting information, it offers direct communication between the monitoring and monitored components thereby removing dependencies inherent to hierarchical and daisy-chained wired networks; (c) it does not rely on a physical interconnect thereby lowering implementation cost, offering non-intrusive monitoring, and improving reliability thanks to the lack of error- and failure-prone cables and connectors.
This report is an extended version of a paper presented at HOTI 2002, Stanford, California, August 2002. It received the Most Interesting New Topic Award.
Least Choice First (LCF) Scheduling Method for High-speed Network Switches, The
We describe a novel method for scheduling high-speed network switches. The targeted architecture is an input-buffered switch with a non-blocking switch fabric. The input buffers are organized as virtual output queues to avoid head-of-line blocking. The task of the scheduler is to decide when the input ports can forward packets from the virtual output queues to the corresponding output ports. Our Least Choice First (LCF) scheduling method selects the input and output ports to be matched by prioritizing the input ports according to the number of virtual output queues that contain packets: The fewer virtual output queues with packets, the higher the scheduling priority of the input port. This way, the number of switch connections and, with it, switch throughput is maximized. Fairness is provided through the addition of a round-robin algorithm.
We present two alternative implementations: A central implementation intended for narrow switches and a distributed implementation based on an iterative algorithm intended for wide switches.
The simulation results show that the LCF scheduler outperforms other scheduling methods such as the parallel iterative matcher [1], iSLIP [12], and the wave front arbiter [16].
This report is an extended version of a paper presented at IPDPS 2002, Fort Lauderdale, Florida, April 2002.
Separated High-bandwidth and Low-latency Communication in the Cluster Interconnect Clint
An interconnect for a high-performance cluster has to be optimized in respect to both high throughput and low latency. To avoid the tradeoff between throughput and latency, the cluster interconnect Clint has a segregated architecture that provides two physically separate transmission channels: a bulk channel optimized for high-bandwidth traffic and a quick channel optimized for low-latency traffic. Different scheduling strategies are applied. The bulk channel uses a scheduler that globally allocates time slots on the transmission paths before packets are sent off. In this way, collisions as well as blockages are avoided. In contrast, the quick channel takes a best-effort approach by sending packets whenever they are available thereby risking collisions and retransmissions.
Clint is targeted specifically at small- to medium-sized clusters offering a low-cost alternative to symmetric multiprocessor (SMP) systems. This design point allows for a simple and cost-effective implementation. In particular, by buffering packets only on the hosts and not requiring any buffer memory on the switches, protocols are simplified as switch forwarding delays are fixed, and throughput is optimized as the use of a global schedule is now possible.
This report is an extended version of a paper presented at SC2002, Baltimore, Maryland, November 2002.
Adaptive Coordination Among Fuzzy Reinforcement Learning Agents Performing Distributed Dynamic Load Balancing
In this paper we present an adaptive multi-agent coordination algorithm applied to the problem of distributed dynamic load balancing. As a specific example, we consider the problem of dynamic web caching in the Internet. In our general formulation of this problem, each agent represents a mirrored piece of content that tries to move itself closer to areas of the network with a high demand for this item. Each agent in our model uses a fuzzy rulebase for choosing the optimal direction of motion and adjusts the parameters of this rulebase using reinforcement learning. The resulting architecture for multi-agent coordination among fuzzy reinforcement learning agents (MAC-FRL) allows the team of agents to adaptively redistribute its members in the environment to match the changing pattern of demand. We simulate the performance of MAC-FRL and show that it significantly improves performance over non-coordinating agents.
Developing Secure Web Applications for Constrained Devices
Invited presentation at the 11th World Wide Web Conference, Hawaii, May 2002.
On the Design of a New CPU Architecture for Pedagogical Purposes
Ant-32 is a new processor architecture designed specifically to address the pedagogical needs of teaching many subjects, including assembly language programming, machine architecture, compilers, operating systems, and VLSI design. This paper discusses our motivation for creating Ant-32 and the philosophy we used to guide our design decisions and gives a high-level description of the resulting design.
Experiments in Wireless Internet Security
Proc. of IEEE Wireless Communications and Networking Conference (WCNS), Orlando, Mar. 2002.
Walkabout-A Retargetable Dynamic Binary Translation Framework
Dynamic compilation techniques have found a renaissance in recent years due to their use in high-performance implementations of the Java™ language. Techniques origi-nally developed for use in virtual machines for such object-oriented languages as Smalltalk are now commonly used in Java virtual machines (JVM™) and Java just-in-time compilers. These techniques have also been applied to binary translation in recent years, most commonly appearing in binary optimizers for a given platform that improve the performance of binary programs while they execute.
The Walkabout project investigates and develops dynamic binary translation techniques that are based on properties of retargetability, ease of experimentation, separation of machine-dependent from machine-independent concerns, and good debugging support. Walkabout is a framework for experimenting with dynamic binary translation ideas, as well as techniques in related areas such as interpreters, instrumentation tools, and optimization.
In this report, we present the design of the Walkabout framework and its initial implementation. Tools generated from this initial framework include disassemblers, machine code interpreters (emulators), and binary rewriting tools for the SPARC® and x86 architectures.
Experience in the Design, Implementation and Use of a Retargetable Static Binary Translation Framework
Binary translation, the process of translating binary executables, makes it possible to run code compiled for source (input) machine Ms on target (output) machine Mt . Unlike an interpreter or emulator, a binary translator makes it possible to approach the speed of native code on machine Mt . Translated code may still run slower than native code because low-level properties of machine Ms must often be modeled on machine Mt.
The University of Queensland Binary Translation (UQBT) framework is a retargetable framework for experimenting with static binary translation on CISC and RISC machines. The system was built jointly by The University of Queensland and Sun Microsystems Laboratories in order to experiment with translations to and from different machines, to understand how to migrate applications from other UNIX migrate-based platforms to a (SPARC®, Solaris™) platform, and to experiment with translations from the current SPARC architecture to a future, not yet existing, version of the SPARC architecture.
This paper describes the overall design and architecture of the UQBT framework, the goals for the project, the resulting framework, experiences with translations across different machines, and lessons learned.
A Transformational Approach to Binary Translation of Delayed Branches with Applications to SPARC® and PA-RISC Instructions Sets
A binary translator examines binary code for a source machine, optionally builds an intermediate representation, and generates code for a target machine. Understanding what to do with delayed branches in binary code can involve tricky case analyses, e.g., if there is a branch instruction in a delay slot. Correctness of a translation is of utmost importance. This paper presents a disciplined method for deriving such case analyses. The method identifies problematic cases, shows the translations for the non-problematic cases, and gives confidence that all cases are considered.The method supports such common architectures as SPARC®, MIPS, and PA-RISC.
We begin by writing a very simple interpreter for the source machine's code. We then transform the interpreter into an interpreter for a target machine without delayed branches. To maintain the semantics of the program being interpreted, we simultaneously transform the sequence of source-machine instructions into a sequence of target-machine instructions. The transformation of the instructions becomes our algorithm for binary translation. We show the translation is correct by reasoning about corresponding states on source and target machines.
Instantiation of this algorithm to the SPARC V8 and PA-RISC V1.1 architectures is shown. Of interest, these two machines share seven of 11 classes of delayed branching semantics; the PA-RISC has three classes which are not available in the SPARC architecture, and the SPARC architecture has one class which is not available in the PA-RISC architecture.
Although the delayed branch is an architectural idea whose time has come and gone, the method is significant to anyone who must write tools that deal with legacy binaries. For example, translators using this method could run PA-RISC on the new IA-64 architecture, or they may enable architects to eliminate delayed branches from a future version of the SPARC architecture.
*This report is a very extended version of TR 440, Department of Computer Science and Electrical Engineering, The University of Queensland, Dec 1998, and describes applications of the technique to translations of SPARC® and PA-RISC codes. This report fully documents the translation algorithms for these machines.
Towards a Java™-Based Enterprise Client for Small Devices
The goal of the work reported here was to explore the use of the Java 2 Micro Edition (J2ME™) platform for applications connected to the enterprise, specifically focusing on Palm-based wireless applications. We found that the Java™ platform on the Palm is still maturing. The Palm itself has been carefully engineered to support small native applications, with a distinctive graphical user interface tuned for its display. Work remains to be done on the Palm to support more complex wireless applications and to make Java-based applications competitive. We also found that wireless enterprise applications in general are somewhat problematic, due to issues of network reliability, availability, bandwidth, and provisioning. Significantly, programming languages and their platforms are not the gating factors to large scale wireless deployment.
Securing the Wireless Internet
IEEE Communications Magazine, pp. 68-74.
KSSL: Experiments in Wireless Internet Security
Internet enabled wireless devices continue to proliferate and are expected to surpass traditional Internet clients in the near future. This has opened up exciting new opportunities in the mobile e-commerce market. However, data security and privacy remain major concerns in the current generation of "wireless web" offerings. All such offerings today use a security architecture that lacks end-to-end security. This unfortunate choice is driven by perceived inadequacies of standard Internet security protocols like SSL (Secure Sockets Layer) on less capable CPUs and low-bandwidth wireless links.
This report presents our experiences in implementing and using standard security mechanisms and protocols on small wireless devices. We have created new classes for the Java 2 Micro-Edition (J2ME[tm]) platform that offer fundamental cryptographic operations such as message digests and ciphers as well as higher level security protocols like SSL. Our results show that SSL is a practical solution for ensuring end-to-end security of wireless Internet transactions even within today s technological constraints.
Parallel Garbage Collection For Shared Memory Multiprocessors
We present a multiprocessor "stop-the-world" garbage collection framework that provides multiple forms of load balancing. Our parallel collectors use this framework to balance the work of root scanning, using static overpartitioning, and also to balance the work of tracing the object graph, using a form of dynamic load balancing called work stealing. We describe two collectors written using this framework: pSemispaces, a parallel semispace collector, and pMarkcompact, a parallel markcompact collector.
Safe Class and Data Evolution in Large and Long-Lived Java[tm] Applications
There is a growing class of applications implemented in object-oriented languages that are large and complex, that exploit object persistence, and need to run uninterrupted for long periods of time. Development and maintenance of such applications can present challenges in the following interrelated areas: consistent and scalable evolution of persistent data and code, optimal build management, and runtime changes to applications.
The research presented in this thesis addresses the above issues. Since the Java[tm] programming language is becoming the increasingly popular platform for implementing large and long-lived applications, it was chosen for experiments.
Bringing Big Security to Small Devices
JavaOne 2001, San Francisco, Jun. 2001.
Securing J2ME Applications
Invited presentation at the J2ME Wireless Headquarter briefing, Menlo Park, Apr 2001.
Automated and Portable Native Code Isolation
The coexistence of programs written in a safe language with user-supplied unsafe (native) code is convenient (it enables direct access to hardware and operating system resources and can improve application performance), but at the same time it is problematic (it leads to undesirable interference with the language runtime, decreases overall reliability, and lowers debuggability). This work aims at retaining most of the benefits of interfacing a safe language with native code while addressing its problems. It is carried out in the context of the Java[tm] Native Interface (JNI).
Our approach is to execute the native code in an operating system process different from that of the safe language application. A technique presented in this paper accomplishes this transparently, automatically, and without sacrificing any of the JNI functionality. No changes to the Java virtual machine (JVM[tm]) or its runtime are necessary. The resulting prototype does not depend on a particular implementation of the JVM, and is highly portable across hardware architectures and operating systems. This approach can readily be used to improve reliability of applications consisting of a mix of safe and native code; to enable the execution of user-supplied native code in multitasking systems based on safe languages and in embedded virtual machines; and to facilitate mixed-mode debugging, without the need to re-implement any of the components of the language runtime. The design and implementation of a prototype system, performance implications, and the potential of this architecture are discussed in the paper.
Orthogonal Persistence for the Java[tm] Platform: Specification and Rationale
Orthogonal persistence provides the programmer with persistence for all data types, with minimal impact on the programing model or development process. We motivate the addition of orthogonal persistence to the Java[tm] platform, and show how this results in a simple and appealing application development model. The overall goal is to provide the illusion of continuous computation in the face of system shutdowns, planned or unplanned. This is achieved by checkpointing the state of the system periodically to stable memory.
We describe how the principles of orthogonal persistence are applied to the Java[tm] programming language and specify the small set of changes to the Java language specification and core libraries necessary to fulfill these principles. We describe the rationale for our particular choices, informed by the experience with the PJama prototype implementations. Finally, the programming model for managing state that is external to the Java[tm] virtual machine is discussed in detail.
Mob Software: The Erotic Life of Code
Keynote talk at ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, October 19, 2000, Minneapolis, Minnesota, USA.
Mob Software: The Erotic Life of Code
Keynote talk at ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, October 19, 2000, Minneapolis, Minnesota, USA.
Even Better DCAS-Based Concurrent Deques
The computer industry is examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a primitive will be incorporated into hardware design, its utility needs to be proven by developing a body of effective non-blocking data structures using DCAS.In a previous paper, we presented two linearizable non-blocking implementations of concurrent deques (double-ended queues) using the DCAS operation. These improved on previous algorithms by nearly always allowing unimpeded concurrent access to both ends of the deque while correctly handling the difficult boundary cases when the deque is empty or full. A remaining open question was whether, using DCAS, one can design a non-blocking implementation of concurrent deques that allows dynamic memory allocation but also uses only a single DCAS per push or pop in the best case.This paper answers that question in the affirmative. We present a new non-blocking implementation of concurrent deques using the DCAS operation. This algorithm provides the benefits of our previous techniques while overcoming drawbacks. Like our previous approaches, this implementation relies on automatic storage reclamation to ensure that a storage node is not reclaimed and reused until it can be proved that the node is not reachable from any thread of control. This algorithm uses a linked-list representation with dynamic node allocation and therefore does not impose a fixed maximum capacity on the deque. It does not require the use of a "spare bit" in pointers. In the best case (no interference), it requires only one DCAS per push and one DCAS per pop. We also sketch a proof of correctness.
KSSL: A Secure Socket Layer (SSL) implementation for small devices
Invited presentation at the WAP Forum's Security Group meeting, Hong Kong, Sep. 2000.
End-to-end Security for Small Devices
48th IETF meeting, TLS working group, Pittsburg, Sep. 2000.
Advantages of cooperation between reinforcement learning agents in difficult stochastic problems
This paper presents the first results in understanding the reasons for cooperative advantage between reinforcement learning agents. We consider a cooperation method which consists of using and updating a common policy. We tested this method on a complex fuzzy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents each learning for N time steps outperform K independent agents each learning in a separate world for K*N time steps. We explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforcement learning problems.
Displaying and Editing Source Code in Software Engineering Environments
Second International Symposium on Constructing Software Engineering Tools (CoSET'2000), Limerick, Ireland, June 5, 2000
Abstract:
Source code plays a major role in most software engineering environments. The interface of choice between source code and human users is a tool that displays source code textually and possibly permits its modification. Specializing this tool for the source code's language promises enhanced services for programmers as well as better integration with other tools. However, these two goals, user services and tool integration, present conflicting design constraints that have previously prevented specialization. A new architecture, based on a lexical representation of source code, represents a compromise that satisfies constraints on both sides. A prototype implementation demonstrates that the technology can be implemented using current graphical toolkits, can be made highly configurable using current language analysis tools, and that it can be encapsulated in a manner consistent with reuse in many software engineering contexts.
Review of the Rationale and Architectures of PJama: a Durable, Flexible, Evolvable and Scalable Orthogonally Persistent Programming Platform, A
A primary goal of research into orthogonal persistence is to simplify significantly the construction, maintenance and operation of applications in order to save software costs, extend the range of applications and improve users' experiences. To test such claims we need relevant experiments. To mount such experiments requires an industrial-strength persistent programming platform. The PJama project is an attempt to build such a platform and initiate those experiments. We report our design decisions and their consequences evaluated by four years of experience. We have reached a range of platforms, demonstrated orthogonality and provided durability, schema evolution with instance reformatting, platform migration and recovery. The application programming interface is now close to minimal, while we support open systems through a resumable-programming model. Our architecture is flexible and supports a range of optimisations. Performance measurements and current applications attest to our progress, but it is still possible to identify major research questions, and the experiments to test the utility of orthogonal persistence are still in their early stages.
Challenges Facing Mobile IP
Invited presentation at the Mobile IP Conference, London, Jan. 2000.
Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes
We consider a pseudo-realistic world in which one or more opportunities appear and disappear in random locations. Agents use fuzzy reinforcement learning to learn which opportunities are most worthy of pursuing based on their promised rewards, expected lifetimes, path lengths and expected path costs. We show that this world is partially observable because the history of an agent influences the distribution of its future states. We implement a coordination mechanism for allocating opportunities to different agents in the same world. Our results show that optimal team performance results when agents behave in a partially selfish way. We also implement a cooperation mechanism in which agents share experience by using and updating one joint behavior policy. Our results demonstrate that K cooperative agents each learning in a separate world for N time steps outperform K independent agents each learning in a separate world for K*N time steps, with this result becoming more pronounced as the degree of partial observability in the environment increases.
Flexible Authentication for DHCP Messages
45th IETF meeting, Oslo, Norway, Jul. 1999.
Efficient Meta-lock for Implementing Ubiquitous Synchronization, An
Programs written in concurrent object-oriented languages, especially ones that employ threadsafe
reusable class libraries, can execute synchronization operations (lock, notify, etc.) at an amazing
rate. Unless implemented with utmost care, synchronization can become a performance bottleneck.
Furthermore, in languages where every object may have its own monitor, per-object space overhead must
be minimized. To address these concerns, we have developed a meta-lock to mediate access to
synchronization data. The meta-lock is fast (lock + unlock executes in 11 SPARCTM instructions),
compact (uses only two bits of space), robust under contention (no busy-waiting), and flexible
(supports a variety of higher-level synchronization operations). We have validated the meta-lock with
an implementation of the synchronization operations in a high-performance product-quality JavaTM virtual
machine and report performance data for several large programs.
Secure, Remote Access over the Internet using IPSec
44th IETF meeting, Minnesota, (BOF on IPsec based Remote Access (IPSRA)), Mar. 1999.
Spotless System: Implementing a JavaTMSystem for the Palm Connected Organizer, The
The majority of recent Java implementations have been focused on speed.
There are, however, a large number of consumer and industrial devices and
embedded systems that would benefit from a small Java implementation supporting
the full bytecode set and dynamic class loading. In this report we describe the
design and implementation of the Spotless system, which is based on a new Java
virtual machine developed at Sun Labs and targeted specifically at small
devices such as personal organizers, cellular telephones, and pagers. We also
discuss a set of basic class libraries we developed that supports small
applications, and describe the version of the Spotless system that runs on the
Palm Connected Organizer.
Virtual Collaborative Learning: A Comparison between Face-to-Face Tutored Video Instruction (TVI) and Distributed Tutored Video Instruction (DTVI)
Tutored Video Instruction (TVI) is a collaborative learning methodology in which a small group of students studies a videotape of a lecture. We constructed a fully virtual version of TVI called Distributed Tutored Video Instruction (DTVI), in which each student has a networked computer with audio microphone-headset and video camera to support communication within the group. In this report, we compare survey questionnaires, observations of student interactions, and grade outcomes for students in the face-to-face TVI condition with those of students in the DTVI condition. Our analysis also includes comparisons with students in the original lecture. This two and a half year study involved approximately 700 students at two universities. Despite finding a few statistically significant process differences between TVI and DTVI, the interactions were for the most part quite similar. Course grade outcomes for TVI and DTVI were indistinguishable, and these collaborative conditions proved better than lecture. We conclude that this kind of highly interactive virtual collaboration can be an effective way to learn.
GC Interface in the EVM¹, The
This document describes how to write a garbage collector (GC) for the EVM. It assumes that the
reader has a good understanding of garbage collection issues and some familiarity with the JavaTM
language. The EVM is part of a research project at Sun Labs. The interfaces described in this document
are under development and are guaranteed to change. In fact, the purpose of this document is to solicit
feedback to improve the interfaces described herein. As a result, specific product plans should not be
based on this document; everything is expected to change.1EVM, the Java virtual machine known previously
as ExactVM, is embedded in Sun's Java 2 SDK Production Release for SolarisTM, available at
http://www.sun.com/solaris/java/.
Proceedings of the Second International Workshop on Persistence and Java
These proceedings record the Second International Workshop on Persistence and Java, that was held in Half Moon Bay in the San Francisco Bay Area, in August 1997. The focus of the workshop series is the relationship between the Java platform and longterm storage, such as databases and orthogonal persistence. If future application programmers building large and longlived systems are to be well supported, it is essential that the lessons of existing research into language and persistence combinations are utilized, and that the research community develops further results needed for the Java platform. The initial idea for the workshop series came from Malcolm Atkinson who leads the Persistence and Distribution Research Group at Glasgow University, and who is a Visiting Professor at SunLabs. The workshop series is one of the fruits of the collaboration between the Forest group at SunLabs, led by Mick Jordan, and the Glasgow group.
Internet Security Mechanisms
Invited presentation at Nomadic '97 as Chairperson of a session on Internet Security, Aug. 1997.
Security for Mobile Users
Half-day tutorial presentation at Nomadic '97, Aug. 1997
First International Workshop on Persistence and Java
These proceedings record the First International Workshop on Persistence and Java, which was held in Drymen, Scotland in September 1996. The focus of this workshop was the relationship between the Java languages and long-term data storage, such as databa ses and orthogonal persistence. There are many approaches being taken, some pragmatic and some guided by design principles. If future application programmers building large and long-lived systems are to be well-supported, it is essential that the lesson s of existing research into language and database combinations are utilized, and that the research community develops further results needed for Java.
The initial idea for the workshop came from Malcolm Atkinson, who leads the Persistence and Distribution Research group at Glasgow University. The idea was one of the first fruits of the collaborative research program that was initiated between Sun Micro systems Laboratories (SunLabs) and the Glasgow group in the fall of 1995. SunLabs sponsored the workshop to cover the attendees' local costs.
Software Configuration Management in an Object Oriented Database
USENIX Conference on Object-Oriented Technologies (COOTS), Monterey, CA, June 26-29, 1995
Abstract:
The task of configuration management for software development environments is not well supported by conventional files, directories, and ad hoc persistence mechanisms. Typed, immutable objects combined with ubiquitous versioning provide a much more sound basis. A prototype configuration management system currently under development relies upon persistent objects provided by a commercial object-oriented database system. Mechanisms and policies developed for this prototype simplify programming with persistent objects; results include simplicity of both design and implementation, as well as flexibility, and extensibility. Initial measurements suggest that performance is acceptable.
Combining Message Switching with Circuit Switching in the Interconnection Cached Multiprocessor Network
in Proc. of the 1994 Int'l Symposium on Parallel Architecture, Algorithms, and Networks (ISPAN), Kanazawa, Japan, Dec. 1994, pp. 143-150.
Self: The Power of Simplicity
Self is an object-oriented language for exploratory programming based on a small number of simple and concrete ideas: prototypes, slots, and behavior. Prototypes combine inheritance and instantiation to provide a framework that is simpler and more flexible than most object-oriented languages. Slots unite variables and procedures in a single construct. This permits the inheritance hierarchy to take over the function of lexical scoping in conventional languages. Finally, because Self does not distinguish state from behavior, it narrows the gaps between ordinary objects, procedures, and closures. Self's simplicity and expressiveness offer new insights into object-oriented computation.
Counterflow Pipeline Processor Architecture
The counterflow pipeline processor architecture (CFPP) is a proposal for a family of microarchitectures for RISC processors. The architecture derives its name from its fundamental feature, namely that instructions and results flow in opposite directions within a pipeline and interact as they pass. The architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs. Moreover, CFPP designs allow asynchronous implementations, in contrast to conventional pipeline designs where the synchronization required for operand forwarding makes asynchronous designs unattractive. This paper presents the CFPP architecture and a proposal for an asynchronous implementation. Detailed performance simulations of a complete processor design are not yet available.
Keywords: processor design, RISC architecture, micropipelines, FIFO, asynchronous systems
CR Categories: B.2.1, B.6.1, C.1.0
Subcontract: A Flexible Base for Distributed Programming
A key problem in operating systems is permitting the orderly introduction of new properties and new implementation techniques. We describe a mechanism, subcontract, that within the context of an object-oriented distributed system permits application programmers control over fundamental object mechanisms. This allows programmers to define new object communication mechanisms without modifying the base system. We describe how new subcontracts can be introduced as alternative communication mechanisms in the place of existing subcontracts. We also briefly describe some of the uses we have made of the subcontract mechanism to support caching, crash recovery, and replication.
New Approaches to Digital Optical Computing Using Parallel Optical Array Logic
in Photonics in Switching (Vol. I), J. Midwinter, ed., Academic Press, Inc., 1993, pp. 195-223.
User Interaction in Language-Based Editing Systems
Ph.D. Dissertation, Computer Science Division, EECS, University of California Berkeley, December 1992. Published as Technical Report UCB/CSD-93-726
Abstract:
Language-based editing systems allow users to create, browse, and modify structured documents (programs in particular) in terms of the formal languages in which they are written. Many such systems have been built, but despite steady refinement of the supporting technology few programmers use them today. In this dissertation it is argued that realizing the potential of these systems demands a it user-centered approach to their design and construction. Pan, a fully-implemented experimental language-based editing and browsing system, demonstrates the viability of the approach.
Careful consideration of the intended user population, drawing on evidence from psychological studies of programmers, from current software engineering practice, and from experience with earlier systems, motivates Pan's design. Important aspects of that design include functional requirements, metaphors that capture the feel of the system from the perspective of users, and an architectural framework for implementation.
Unlike many earlier systems, Pan's design hides the complexity of language-based technology behind a set of simple and appropriate conceptual models --- models of the system and of the documents being viewed. Responding to the true bottleneck in software production, Pan's services are designed to help users understand software rather than save keystrokes writing it. Furthermore, Pan's design framework provides services that degrade gracefully in the presence of malformed documents, incomplete documents, and inconsistent information.
This research has yielded new insight into the design problem at all levels: the suitability of current language-based technology for interactive, user-centered applications; appropriate kernel mechanisms for building coherent user services; new conceptual models of editing that blend textual and structural operations without undue complexity; and the crucial role of local, site-specific design in the delivery of language-based editing services.
A Hardware Compiler for Digital Optical Computing
Optical Computing, Technical Digest Series (Optical Society of America, Washington, D.C.), Mar. 1991, pp. 191-194.
Logic and Interconnections in Optical Computers
Photonics Spectra, pp. 129-134.
A solution to mapping an ASIC design hierarchy into an efficient block-place-and-route layout hierarchy
Netpar, a netlist partitioner tool developed to speed up and automate the process of layout partitioning and preparation is described. The Netpar partitioning commands allow the user to quickly convert the netlist into a good layout hierarchy. Instead of recapturing schematics, the user can easily direct Netpar to restructure the netlist for layout compatibility. Additionally, an automatic partitioner is available that attempts to equalize block sizes and minimize interconnect. This implements well-documented and tested algorithms for generating optimally partitioned netlists. Netpar's automatic and manual commands can be used to quickly modify hierarchy to improve design performance, turnaround times, and densities
A solution to mapping an ASIC design hierarchy into an efficient block-place-and-route layout hierarchy
Netpar, a netlist partitioner tool developed to speed up and automate the process of layout partitioning and preparation is described. The Netpar partitioning commands allow the user to quickly convert the netlist into a good layout hierarchy. Instead of recapturing schematics, the user can easily direct Netpar to restructure the netlist for layout compatibility. Additionally, an automatic partitioner is available that attempts to equalize block sizes and minimize interconnect. This implements well-documented and tested algorithms for generating optimally partitioned netlists. Netpar's automatic and manual commands can be used to quickly modify hierarchy to improve design performance, turnaround times, and densities
A solution to mapping an ASIC design hierarchy into an efficient block-place-and-route layout hierarchy
Netpar, a netlist partitioner tool developed to speed up and automate the process of layout partitioning and preparation is described. The Netpar partitioning commands allow the user to quickly convert the netlist into a good layout hierarchy. Instead of recapturing schematics, the user can easily direct Netpar to restructure the netlist for layout compatibility. Additionally, an automatic partitioner is available that attempts to equalize block sizes and minimize interconnect. This implements well-documented and tested algorithms for generating optimally partitioned netlists. Netpar's automatic and manual commands can be used to quickly modify hierarchy to improve design performance, turnaround times, and densities
Qlisp: Parallel Processing in Lisp
One of the major problems in writing programs to take advantage of parallel processing has been the lack of good multiprocessing languages|one which is both powerful and understandable to programmers. In this paper we describe multiprocessing extensions to Common Lisp designed to be suitable for studying styles of parallel programming at the medium-grain level in a shared-memory architecture. The resulting language is called Qlisp. A problem with parallel programming is the degree to which the programmer must explicitly address synchronization problems. Two new approaches to this problem look promising: the rst is the concept of heavyweight futures, and the second is a new type of function called a partially, multiply invoked function.
An OR-parallel Model and its Implementation for Prolog
Computer Science and Informatics, Journal of the Computer Society of India, Vol. 18, No. 2, pp. 29-42
A Comparison of Two Network-Based File Servers
We compare the Cambridge File System and the Xerox PARC transactional file system in terms of their structures and resulting performance.
Report on the Programming Language Euclid
SIGPLAN Notices 12, with G.J. Popek, J.J. Horning, B.W. Lampson, and R.L. London
On the Problem of Uniform References to Data Structures
IEEE Transactions on Software Engineering SE-1, with C.M. Geschke
On the Problem of Uniform References to Data Structures
IEEE Transactions on Software Engineering SE-1, with C.M. Geschke
A Formalization and Correctness Proof of the CGOL Language System [Pratt Parser]
Technical Report MIT-LCS-TR-147, Laboratory for Computer Science, Massachussetts Institute of Technology, March 1975.
Abstract:
In many important ways the design and implementation of programming languages are hindered rather than helped by BNF. We present an alternative meta-language based on the work of Pratt which retains much of the effective power of BNF but is more convenient for designer, implementer, and user alike. Its amenability to formal treatment is demonstrated by a rigorous correctness proof of a simple implementation.
Note: The parsing technology embedded in the CGOL Language System later became known as the "Pratt Parser".
On the Transfer of Control Between Contexts
Lecture Notes in Computer Science 19, Programming Symposium Proceedings, Colloque sur la Programmation, Paris, G. Goos and J. Hartmanis (Eds.), Springer-Verlag with B.W. Lampson and E.H. Satterthwaite
On the Transfer of Control Between Contexts
Lecture Notes in Computer Science 19, Programming Symposium Proceedings, Colloque sur la Programmation, Paris, G. Goos and J. Hartmanis (Eds.), Springer-Verlag with B.W. Lampson and E.H. Satterthwaite
On the Transfer of Control Between Contexts
Lecture Notes in Computer Science 19, Programming Symposium Proceedings, Colloque sur la Programmation, Paris, G. Goos and J. Hartmanis (Eds.), Springer-Verlag with B.W. Lampson and E.H. Satterthwaite
WATFOR – The University of Waterloo FORTRAN IV Compiler, with P.W. Shantz, R.A. German, R.S.K. Shirley, and C.R.Zarnke
We describe the motivation, implementation, and performance of the University of Waterloo fast Fortran compiler, WATFOR. CACM 10