Behavior Based Approach to Misuse Detection of a Simulated SCADA System

This paper presents the initial findings in applying a behavior-based approach for detection of unauthorized activities in a simulated Supervisory Control and Data Acquisition (SCADA) system. Misuse detection of this type utilizes fault-free system telemetry to develop empirical models that learn normal system behavior. Future monitored telemetry sources that show statistically significant deviations from this learned behavior may indicate an attack or other unwanted actions. The experimental test bed consists of a set of Linux based enterprise servers that were isolated from a larger university research cluster. All servers are connected to a private network and simulate several components and tasks seen in a typical SCADA system. Telemetry sources included kernel statistics, resource usages and internal system hardware measurements. For this study, the Auto Associative Kernel Regression (AAKR) and Auto Associative Multivariate State Estimation Technique (AAMSET) are employed to develop empirical models. Prognostic efficacy of these methods for computer security used several groups of signals taken from available telemetry classes. The Sequential Probability Ratio Test (SPRT) is used along with these models for intrusion detection purposes. The different intrusion types shown include host/network discovery, DoS, brute force login, privilege escalation and malicious exfiltration actions. For this study, all intrusion types tested displayed alterations in the residuals of much of the monitored telemetry and were able to be detected in all signal groups used by both model types. The methods presented can be extended and implemented to industries besides nuclear that use SCADA or business-critical networks.



Simulation-based Code Duplication for Enhancing Compiler Optimizations

Compiler optimizations are often limited by control flow, which prohibits optimizations across basic block boundaries. Duplicating instructions from merge blocks to their prede- cessors enlarges basic blocks and can thus enable further optimizations. However, duplicating too many instructions leads to excessive code growth. Therefore, an approach is necessary that avoids code explosion and still finds beneficial duplication candidates. We present a novel approach to determine which code should be duplicated to improve peak performance. There- fore, we analyze duplication candidates for subsequent op- timizations by simulating a duplication and analyzing its impact on the compilation unit. This allows a compiler to find those duplication candidates that have the maximum optimization potential.




Poster about Simulation based Code Duplication (abstract from associated DocSymp paper) The scope of compiler optimizations is often limited by con- trol flow, which prohibits optimizations across basic block boundaries. Code duplication can solve this problem by ex- tending basic block sizes, thus enabling subsequent opti- mizations. However, duplicating code for every optimization opportunity may lead to excessive code growth. Therefore, a holistic approach is required that is capable of finding optimization opportunities and classifying their impact. This paper presents a novel approach to determine which code should be duplicated in order to improve peak perfor- mance. The approach analyzes duplication candidates for subsequent optimizations opportunities. It does so by simu- lating a duplication operation and analyzing its impact on other optimizations. This allows a compiler to weight up multiple success metrics in order to choose the code duplica- tion operations with the maximum optimization potential. We further show how to map code duplication opportunities to an optimization cost model that allows us to maximize performance while minimizing code size increase.



Detecting Malicious JavaScript in PDFs Using Conservative Abstract Interpretation

To mitigate the risk posed by JavaScript-based PDF malware, we propose a static analysis technique based on abstract interpretation. Our evaluation shows that our approach can identify 100% of malware with a low rate of false positives.



Improving Parallelism in Hardware Transactional Memory

Hardware transactional memory (HTM) is supported by recent processors from Intel and IBM. HTM is attractive because it can enhance concurrency while simplifying programming. Today's HTM systems rely on existing coherence protocols, which implement a requester-wins strategy. This, in turn, leads to very poor performance when transactions frequently conflict, causing them to resort to a non-speculative fallback path. Often, such a path severely limits concurrency. In this paper, we propose very simple architectural changes to the existing requester-wins HTM implementations. The idea is to support a special mode of execution in HTM, called power mode, which can be used to enhance conflict resolution between regular and so-called power transactions. A power transaction can run concurrently with regular transactions that do not conflict with it. This permits higher levels of concurrency in cases when a (regular) transaction cannot make progress due to conflicts and would require a non-speculative fallback path otherwise. Our idea is backward-compatible with existing HTM systems, imposing no additional cost on transactions that do not use the power mode. Furthermore, using power transactions requires no changes to target applications that employ traditional lock synchronization. Using extensive evaluation of micro- and STAMP benchmarks in a transactional memory simulator and real hardware-based emulation, we show that our technique significantly improves the performance of the baseline that does not use power mode, and performs comparably with state-of-the-art related proposals that require more substantial architectural changes.



Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory

We report our experience building and evaluating pmemcached, a version of memcached ported to byte-addressable persistent memory. Persistent memory is expected to not only improve overall performance of applications’ persistence tier, but also vastly reduce the “warm up” time needed for applications after a restart. We decided to test this hypothesis on memcached, a popular key-value store. We took the extreme view of persisting memcached’s entire state, resulting in a virtually instantaneous warm up phase. Since memcached is already optimized for DRAM, we expected our port to be a straightforward engineering effort. However, the effort turned out to be surprisingly complex during which we encountered several non-trivial problems that challenged the boundaries of memcached’s architecture. We detail these experiences and corresponding lessons learned.



FastR update: Interoperability, Graphics, Debugging, Profiling, and other hot topics

This talk present an overview of the current progress in FastR in a number of areas that saw significant progress in the last year, e.g., Interoperability, Graphics, Debugging, Compatibility, etc.



Zero-overhead R and C/C++ integration with FastR

Traditionally, C and C++ are often used to improve performance for R applications and packages. While this is usually not necessary when using FastR, because it can run R code at near-native performance, there is a large corpus of existing code that implements critical pieces of functionality in native code. Alternative implementations of R need to simulate the R native API, which is a complex API that exposes many implementation details. They spend significant effort and performance overhead to simulate the API, and there is a compilation and optimization barrier between languages. FastR can employ the Truffle framework to run native code, available as LLVM bitcode, inside the optimization scope of the polyglot environment, and thus have it integrated with no optimization and integration barriers.



Trace Register Allocation Policies: Compile-time vs. Performance Trade-offs

Register allocation has to be done by every compiler that targets a register machine, regardless of whether it aims for fast compilation or optimal code quality. State-of-the-art dynamic compilers often use global register allocation approaches such as linear scan. Recent results suggest that non-global trace-based register allocation approaches can compete with global approaches in terms of allocation quality. Instead of processing the whole compilation unit at once, a trace-based register allocator divides the problem into linear code segments, called traces. In this work, we present a register allocation framework that can exploit the additional flexibility of traces to select different allocation strategies based on the characteristics of a trace. This allows fine-grained control over the compile time vs. peak performance trade-off. Our framework features three allocation strategies, a linear-scan-based approach that achieves good code quality, a single-pass bottom-up strategy that aims for short allocation times, and an allocator for trivial traces. We present 6 allocation policies to decide which strategy to use for a given trace. The evaluation shows that this approach can reduce allocation time by 3-43% at a peak performance penalty of about 0-9% on average. For systems that do not mainly focus on peak performance, our approach allows adjusting the time spent for register allocation, and therefore the overall compilation timer, finding the optimal balance between compile time and peak performance according to an application’s requirements.



Practical partial evaluation for high-performance dynamic language runtimes

Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system. This violates the principle to not repeat yourself. In contrast, we define languages solely by writing an interpreter. The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information. Compiled code is derived automatically using partial evaluation while incorporating these specializations. This makes partial evaluation practical in the context of dynamic languages: It reduces the size of the compiled code while still compiling all parts of an operation that are relevant for a particular program. When a speculation fails, execution transfers back to the interpreter, the program re-specializes in the interpreter, and later partial evaluation again transforms the new state of the interpreter to compiled code. We evaluate our approach by comparing our implementations of JavaScript, Ruby, and R with best-in-class specialized production implementations. Our general-purpose compilation system is competitive with production systems even when they have been heavily optimized for the one language they support. For our set of benchmarks, our speedup relative to the V8 JavaScript VM is 0.83x, relative to JRuby is 3.8x, and relative to GNU R is 5x.



SOAP 2017 Presentation - An Efficient Tunable Selective Points-to Analysis for Large Codebases

Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. Although object-based context sensitivity is known to improve precision of points-to analysis, scaling it for large Java codebases remains an challenge. In this work, we develop a tunable, client-independent, object-sensitive points-to analysis framework where heap cloning is applied selectively. This approach is aimed at large codebases where standard analysis is typically expensive. Our design includes a pre-analysis that determines program points that contribute to the cost of an object-sensitive points-to analysis. A subsequent analysis then determines the context depth for each allocation site. While our framework can run standalone, it is also possible to tune it – the user of the framework can use the knowledge of the codebase being analysed to influence the selection of expensive program points as well as the process to differentiate the required context-depth. Overall, the approach determines where the cloning is beneficial and where the cloning is unlikely to be beneficial. We have implemented our approach using Souffl ́e (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs, including OpenJDK, show that our technique is efficient and precise. For the OpenJDK, our analysis reduces 27% of runtime and 18% of memory usage for a negligible loss of precision, while for Jython from the DaCapo benchmark suite, the same analysis reduces 91% of runtime for no loss of precision.


Hardware and Software, Engineered to Work Together