Experimental: Models for Big data stream processing

As part of the EU FP7 project JUNIPER, we have been experimenting with modelling support for big data streaming applications within Modelio.

JUNIPER is a research project that started in October 2012 and funded by the European Commission. Led by The Open Group and under the scientific steering of the University of York, it aims at developing a Java Platform that can support real-time big data streaming applications, coupled with architectural patterns, Java acceleration technology and a modelling environment based on UML and implemented in Modelio (hereafter called MDE Environment).

Context

As displayed in the picture above, the MDE Environment stores the models of a JUNIPER application along with its real-time constraints. This model is then used to generate the skeleton of the Java code of the application that is then filled in by the developer and. Thanks to Modelio round trip reverse engineering support, the model is kept updated with the code automatically. The MDE Environment also generates scripts for compiling and deploying the application.

Finally, it feeds other components developed during the JUNIPER project with useful information at design and run time. The Schedulability Analysis tool receives a simplified model of the application deployment, behavior and architecture, so that developers can anticipate the application meeting or not the defined constraints. The Scheduling advisor receives deployment and architecture models, so that it can monitor the application at runtime and provide advices on how to improve its performance.

At runtime, applications run either on the online or on the offline environments described on the picture below.

Context

On both cases, the main interface of the developer code is with the JUNIPER API, which interfaces with the JUNIPER Platform, translating Java method calls into low level communication patterns using the MPI library. On the online environment, the application runs on the Jamaica VM real-time Java Virtual Machine, which interfaces with the real-time version of Linux and the FPGA supports produced by the JUNIPER project. On the offline environment, a standard JDK is used, and the Offline MPI library is used to simulate the MPI environment expected by the JUNIPER platform on plain Java.

This tutorial will present the programming model put forward by JUNIPER and realized in the form of the JUNIPER MDE Environment. To illustrate the approach we will build a simple data streaming pipeline based on the well-known map-reduce pattern.

Prerequisites

For this tutorial, you will need:

  • Modelio 3.3 open source, which can be found here.

  • Pattern Designer 3.3 Free, which can be found here.

  • MARTE Designer 3.3.01, which can be found here.

  • The MDE Environment module file: JuniperIDE_0.2.411.

  • The JDK 1.8 model library, which can be found here.

  • The JUNIPER API and other necessary jars to be included in your project’s class path, which can be found here.

  • The Map reduce pattern we will reuse to build the sample application, which can be found here.

In this tutorial, we will not present the basics of Modelio usage, for that, refer to our Modelio specific documentation.

Why modelling big data applications?

The advent of big data applications is in fact the advent of highly parallel distributed applications. In opposition to high performance computing (HPC applications, big data applications need to deal with a high variety in terms of data sources, data types and processing pipelines. Given its novelty, big data applications are usually developed using ad-hoc coding and documenting techniques, that are not necessarily fully adapted to these new challenges.

JUNIPER’s answer to such new challenges relies in a modelling paradigm that focuses on the elements underlying big data applications, allowing users to model either application architecture, manipulated data types and deployment. The added value of modelling is even more evident when it comes to using NoSQL databases, since these systems usually do not provide off the shelf support to data modelling.

As an added bonus of the use of a modelling tool, comes the increased automation that may be supported by these tools. As we are going to see in the present tutorial, the JUNIPER MDE environment supports model transformations and code generation helpers to simplify the work of developers when dealing with such applications.

The JUNIPER programming model

The picture below illustrates the main concepts behind the JUNIPER programming model. JUNIPER Applications are composed of Programs that store Data and are hosted by Nodes. Communication Channels connect programs and Real-time constraints define program execution limits.

At modelling level, elements are divided into two parts: the Software platform model, defines the high level architecture of the application (Programs, Data, Communication Channels and Real-time constraints); while the Hardware model defines the Nodes where the application will be deployed, it also defines how many times each JUNIPER Program will be replicated at deployment time.

Modelling

Sample application

An overview of the sample application is displayed in the picture below. It consists of a map/reduce pipeline that computes the sum of the integers it receives as input. For the sake of simplicity, a Generator program generates random integers to feed the pipeline, and Mappers and Reducers both sum the numbers received as input. Notice that in a real application, they would do different computations.

Sample application

The rest of this tutorial will show how to implement this application using the MDE Environment and how to run it on Eclipse.

The complete source code of the application can be found here.

Youtube Video

The steps in this tutorial can also be found in the following video:

Basic configuration: installing the Modelling environment and creating an empty project

In order to install the MDE Environment and its prerequisite modules, just install the modules on the Modelio modules catalogue. They will then be available to users that will be able to add them on existing or new modelling projects.

Adding or removing modules from the modules catalogue

Modelio modules are complementary components, each of which provides specific services tailored to a particular modelling need. Modelio provides a number of modules, all of which exploit a model for a specialized need (for example, documentation or Java code generation). When a module is installed, it provides specific menus, icons and specialized annotations.

To add or remove a module from the modules catalogue, the Add a module to this catalogue and Remove module from the catalogue buttons are used.

The screenshot in bellow shows how to add a module to the module catalogue.

Creating a project in Modelio.

  1. Open the Configuration / Modules catalogue command.
  2. In the Modules catalogue window, click on Add a module to the catalogue and use the file browser to select the modules (*.jmdac files).
  3. Click on Close button when your catalogue is up-to-date.

Creating a new project

To create a new Modelio project that is going to contain the JUNIPER model:

  1. Click on File\New project.
  2. Enter the name of the project.
  3. Enter the description of the project.
  4. Click on Create to create and open the project.

Adding a module to a project

Figure bellow shows how to add a module to a project:

Adding a module to a Modelio project.

  1. Open the Module Configuration page in Configuration menu.
  2. Expand the Modules catalogue.
  3. In the Modules catalogue, select the module you want to install.
  4. Install the module in project.

Creating a JUNIPER Application model

In order to create a JUNIPER Application model, right click on the root package of your project and select the Create Juniper model sub menu on the JuniperIDE module menu. The figure below shows the structure of a JUNIPER Application model. It is divided into two parts: a software platform and a hardware platform. The software platform describes the JUNIPER programs, their connections and behaviour in a high level way, while the hardware platform describes their abstract deployment on the JUNIPER platform.

Structure of a JUNIPER Application.

In this tutorial, we will focus on the architectural and deployment features of the application model. The behavioral and data modelling are out of the scope of this tutorial.

New elements can be added to a JUNIPER model just by right clicking on an existing element and then using the creation options in the JuniperIDE module menu. The JuniperIDE module dynamically adapts in order propose creation actions for all elements that can be created under a given model element. For example, the figure below shows the set of elements that can be created under a Software platform.

Creating new modelling elements using the JUNIPER IDE submenu.

As we can see, four kinds of elements can be created: a non-pre-emptive software region, a program, a Java class or a Java interface.

Using PatternDesigner to model a simple map/reduce pipeline

During the JUNIPER project we have identified a couple of relevant Big Data architectural patterns on top of the JUNIPER platform. They have been documented and implemented by means of the Modelio’s PatternDesigner module.

In this tutorial, we will use this module to instantiate a map reduce pipeline. The advantage of the Pattern Designer module, is that it generates the necessary JUNIPER programs along with the necessary code to implement the semantics of the pattern.

In this tutorial, we will use the Map reduce pattern to instantiate a map reduce pipeline in our project. In order to install and apply the pattern, follow the instructions bellow.

  1. Right click on the Software platform
  2. Select Pattern Designer > Apply Pattern
  3. Click on the Add new pattern button
  4. Select the Map Reduce Pattern_1.0.00.umlt file
  5. Click on the newly installed pattern
  6. Go to the Parameters tab and fill the Pipeline name field

Pattern designer will then create a set of JUNIPER programs representing the map reduce pipeline.

Completing the software platform model with a Generator program

For demo purposes, we will add a JUNIPER program to generate data to be processed by the map reduce pipeline.

In order to do so, following the steps bellow:

  1. Open the diagram in the Software platform
  2. Click on the Create Juniper program button
  3. Rename the created program to Generator
  4. Use the Connect Juniper programs button to connect the Generator program to the mapper using its interface

The final architecture of our sample application will be the following:

Software Platform architecture of our example

Creating Hardware platform models

Hardware models describe the cloud nodes needed to deploy the programs in the application. Each program should be instantiated once in a node. Multiplicities and the ip field are used to define how many actual instances are needed and where these instances should be deployed at runtime.

The hardware platform model can be generated automatically from the software platform model. In order to do that:

  1. Right click on the JUNIPER Application
  2. Select Juniper IDE module > Model transformations > Generate hardware platform model from software platform model

In order to define the number of instances for each application to be deployed, do the following:

  1. Click on a node at the Hardware platform model
  2. On the `Element View' set its multiplicity min and max values to the number of instances you want to deploy.
  3. Still on the Element View, on the <<CloudNode>> tag value ip define the IP addresses where the instances should be deployed.

Code generation and Eclipse import

The MDE Environment generates MPI based Java code that abstracts the communication mechanisms between JUNIPER programs. It also generates the configuration files used by the JUNIPER platform in order to deploy the programs accordingly.

In order to trigger the code generation feature of the MDE environment right-click on the application and choose the option Juniper MDE module > Generate code.

The JUNIPER model is updated to represent the code of the application. The generated code is represented on the MDE Environment as models. The user can than explore and change the model freely. The structure of the code is represented by means of UML packages, classes, attributes and operations. The code of Java methods and attributes initializers is represented by means of UML notes. For more information on how Java elements are represented as UML elements, refer to the documentation of the Java Designer module.

The generated code uses OpenMPI to implement communication between programs. This code is hidden under the JUNIPER API referenced by the generated code. This API handles the low level work involved in sending and receiving MPI messages. The JUNIPER code generation helper uses the Proxy design pattern to provide proxies to called program in the form of Java Interface implementations. The proxies hide the complexity of MPI calls, allowing the developer to communicate by means of plain Java calls.

Importing the generated code into Eclipse

In order to import the generated code just import it as a new project in Eclipse. The generated code is generated on a folder named src\Project Name under the Modelio project folder. This folder contains the generated code, deployment descriptors and compilation and execution scripts.

Java code structure

As displayed in bellow, Java code (1) is separated from (2) deployment descriptors and (3) compilation and execution scripts. Developers can then modify Java files, and they will be reverse engineered automatically by the MDE Environment. The developer is only supposed to change the source code files.

Analysis of the application code

The complete source code of the application can be found here.

Generator

The generator program generates a set of random integers and sends them to the Mapper programs. The execute method contains the behavior of the program. The communication between the generator and the mapper is done via the attribute programSumNumbersMapperISumNumbersMapper. This attribute contains a proxy to the mapper programs that implements that ISumNumbersMapper interface. The communication between programs is then completely hidden under plain method calls.

Notice that Java elements are annotated with the @objid annotation. It links the Java elements to their corresponding UML elements inside Modelio. This allows Java Designer to update the model provided the modifications on the Java source code. These annotations will be ommitted on the following sections.

@objid ("b742ca20-c7c6-48b0-b836-ae1b34c01050")
public class Generator extends org.modelio.juniper.platform.JuniperProgram {
    @objid ("4957a769-ed50-4b37-b4bb-12da8535add3")
    public ISumNumbersMapper programSumNumbersMapperISumNumbersMapper = ... ;

    @objid ("9bacd2b6-8479-4d02-804f-008dabe063cb")
    public void execute() {
        Random random = new Random();
        List<Integer> nums = new ArrayList<Integer>();
        for(int i=0;i<15;++i) {
            nums.add(random.nextInt(1000));
        }
        programSumNumbersMapperISumNumbersMapper.map(nums);
        System.out.println("Generator sending:  " + nums);

        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

}

ProgramSumNumberMapper

The mapper program implements the ISumNumbersMapper interface. It simply collects the integers generated by the Generator (via the map() method) and passes them to the Reducer(s). Notice that the provided interfaces are marked with the annotation @Provided.

public class ProgramSumNumbersMapper extends org.modelio.juniper.platform.JuniperProgram {
    public ISumNumbersReducer programSumNumbersReducerISumNumbersReducer = ...;

    @Provided
    public ISumNumbersMapper iSumNumbersMapperImpl = new juniperApplication.ISumNumbersMapper() {
        @Override
        public void map(List<Integer> data) {
            ...
            List<Integer> rd = ProgramSumNumbersReducer.reduce(data);
            programSumNumbersReducerISumNumbersReducer.reduce(rd);
        }};

}

ProgramSumNumbersReducer

The reducer program provides the ISumNumbersReducer interface. It received the numbers from the mappers and run the reduce() method on them and sends the result to the global reducer. The reduce method computes the sum of all received integers.

public class ProgramSumNumbersReducer extends org.modelio.juniper.platform.JuniperProgram {
    @Provided
    public ISumNumbersReducer iSumNumbersReducerImpl = new juniperApplication.ISumNumbersReducer() {
        @Override
        public void reduce(List<Integer> data) {
            programSumNumbersGeneralReducerISumNumbersReducer.reduce(ProgramSumNumbersReducer.reduce(data));
        }};

    public ISumNumbersReducer programSumNumbersGeneralReducerISumNumbersReducer = ...;

    public static List<Integer> reduce(final List<Integer> data) {
        Integer sum = data.stream().reduce(0, (x,y)->x+y);
        ArrayList<Integer> ret = new ArrayList<Integer>();
        ret.add(sum);
        return ret;
    }

}

ProgramSumNumbersGeneralReducer

Finally, the general reducer reduces the sums received from the reducers and displays the final result.

public class ProgramSumNumbersGeneralReducer extends org.modelio.juniper.platform.JuniperProgram {
    @Provided
    public ISumNumbersReducer iSumNumbersReducerImpl = new juniperApplication.ISumNumbersReducer() {
        @Override
        public void reduce(List<Integer> data) {
            processResult(ProgramSumNumbersReducer.reduce(data).get(0));
        }};

    public void processResult(final Integer result) {
        System.out.println("Result: " + result);
    }

}

Offline execution

In order to simplify JUNIPER applications development and tests, the OfflineMPI component reimplements the OpenMPI Java bindings so that MPI programs can run offline in a Java development environment. Not all features of MPI have been implemented. The current implementation focuses on the features of MPI that are required to make the JUNIPER platform run, namely, the basic synchronous and asynchronous message passing features.

In order to run your application, you need to create a new Java configuration run configuration. The configuration should be defined the following way:

Main type: RunOnPlatform (from mpi)
Program arguments: {number of programs} "rte_deployment_plan.xml"
VM arguments: -Dorg.modelio.juniper.ExecutionLogger=none

Just run the configuration and all programs will be instantiated and run locally.

The output of the application execution should look like that the following.

Generator sending:  [532, 11, 636, 733, 677, 409, 901, 374, 650, 260, 832, 982, 868, 266, 500]
Mapper 0 received: [532, 733, 901, 260, 868]
Mapper 1 received: [11, 677, 374, 832, 266]
Reducer 0 received: [3294]
Mapper 2 received: [636, 409, 650, 982, 500]
Reducer 2 received: [3177]
Reducer 1 received: [2160]
Result: 8631

:

In short, the generator sends integers to mappers, that sum them and send them to reducers. They them sum them and send the result to the general reducer that computes the final result.

Conclusion

In this tutoriall we presented some of the programming model and tools provided by the JUNIPER research project. The implemented programming model allows the modelling of stream based big data applications and the generation of structural and communication related code.

For more information on the tools provided by the JUNIPER project, visit the project Website and its official GitHub repository.

Context.PNG (24.3 KB) Marcos Almeida, 24 November 2015 16:48

Environment.PNG (13.6 KB) Marcos Almeida, 24 November 2015 16:48

Modelling.PNG (15 KB) Marcos Almeida, 24 November 2015 16:48

Map Reduce Pattern_1.0.00.umlt (27.7 KB) Marcos Almeida, 24 November 2015 16:51

JuniperIDE_0.2.411.jmdac (547 KB) Marcos Almeida, 24 November 2015 16:52

Libraries.zip (3.74 MB) Marcos Almeida, 24 November 2015 16:58

SampleApplication.PNG (7.56 KB) Marcos Almeida, 24 November 2015 17:10

SampleApplication.zip (17.9 KB) Marcos Almeida, 24 November 2015 17:15

JavaStructure.PNG (183 KB) Marcos Almeida, 20 January 2016 15:29

ApplicationStructure.PNG (7.01 KB) Marcos Almeida, 20 January 2016 15:35

DynamicMenu.PNG (100 KB) Marcos Almeida, 20 January 2016 15:36

ExampleArchitecture.PNG.png (21.6 KB) Marcos Almeida, 20 January 2016 15:39

ExampleArchitecture.PNG.png (29.3 KB) Marcos Almeida, 20 January 2016 16:11

JavaStructure.PNG (115 KB) Marcos Almeida, 20 January 2016 16:15