Integrated Data as the Foundation of SE – Part 4c

Posted on: January 1st, 2017 by Lou Wheatcraft No Comments

In this third section of part 4, I discuss the current state of most organizations concerning practicing SE from a data-centric perspective and the path needed to move from the current state to a future state where the projects within an organization practice SE from a data-centric perspective using a common, integrated dataset.

Developing the Integrated Dataset

“SE, from a data-centric perspective, involves the formalized application of an integrated dataset to represent the SE work products and underlying data and information generated throughout the system lifecycle.

The common, integrated dataset is at the core of practicing SE from a data-centric perspective. The integrated dataset includes the data and information from several databases and files created by the various SE tools used to develop, document, and manage the various work products (e.g., use cases, diagrams, requirements, models, designs, etc.) and their underlying data and information.  In adopting SE from a data-centric perspective, the end state is to integrate these databases and files into a common dataset as was shown in Figure 6.

It is critical that, at the beginning of a project, the project defines and documents an ontology and a master schema for the project’s common, integrated dataset.  Doing this allows the work products and their underlying data and information to be shared between SE tools and other organizations.  Key considerations in defining the schema include defining the entities, consistent with the ontology, that will be stored in the databases, defining the attributes that will be included as part of a requirement expression, measures that will be used to track the project status, and defining the reports and associated data and information needed by management.  This includes both PM and SE management reports, data, and information.

Achieving an SE capability level where the project has an integrated dataset will not happen overnight.  It will take a journey lasting several years.

Moving From The Current State To A State That Includes An Integrated Dataset

Figure 3: Current State: Stovepiped organizations and datasets

As shown in Figure 3, the current state of many projects is that the various SE lifecycle process are divided across organizational units operating in stovepipes using a variety of legacy project management and SE tools to develop and manage the various work products and their underlying data and information. There is no defined master ontology for the enterprise nor project.  There is no master schema for the datasets representing the work products. The SE tools store the data and information representing the various work products either as electronic files or documents (shown as a solid line) or in their own proprietary database using a proprietary schema.  Unless these tools support a standard for sharing data with other tools, the data in these individual databases are not compatible making it difficult to share data between tools and organizations.

As shown in Figure 4 and Figure 5, interim states, the project has a master ontology and a master schema defined for their integrated dataset.   The tools in the organization’s SE toolset used to generate and manage work products and their underlying data and information have either:

*  special software procured or developed to extract data and information from the individual SE tool databases, transform that data and information to conform to the project’s master dataset schema, and load the transformed data and information into the integrated dataset (this process is referred to as the ETL process); or

*  databases having a schema that is consistent with the project’s master dataset schema allowing the data and information in these databases to be integrated directly into the project’s common, integrated dataset without having to go through an ETL process.

Figure 4: Interim State: Data from existing databases imported into the integrated dataset

The first case, shown in Figure4, will be the most common for most organizations as they start their journey towards establishing a common, integrated dataset enabling them to practice SE from a data-centric perspective. PM has their own legacy tools for budget and scheduling – each with their own methods of tracking various project management performance measures.  Some organizational elements have legacy tools to develop diagrams that are stored as electronic files (as compared to storing work products and their underlying data and information in a database). One organizational element may have a robust legacy requirements management tool (RMT) that has been in use for many years, but has a proprietary database schema.  Another part of the organization has just started using an analytical modeling tool that can be used to support the generation and management of various lifecycle work products and their underlying data and information, but doesn’t have all the robustness of the RMT, so requirements continue to be managed in the RMT and imported into the modeling tool via an ETL process.  Depending on which standards are supported by the tools, this process could be either manual or automated.  Another part of the organization has a legacy design tool that has been in use for many years that is not compatible with the modeling tools nor the RMT.  The tracking of the system verification and validation activities may be done in the RMT tool, but not integrated with the various modeling work products and their underlying data and information.

The first case is less desirable in that the data and information from these legacy databases will have to go through the ETL process to get the data and information into the integrated dataset and any changes made to the SE tool databases must go through the often expensive and time consuming ETL process before the changes can be reflected in the integrated dataset.  This makes it harder to keep the data in the integrated dataset current and consistent across all lifecycle process activities.  Also, anyone doing analysis, modifying/updating work products and their underlying data and information, or generating reports based on the data and information from the integrated dataset, will have to make sure that the data from these external databases is current and consistent.


Figure 5: Interim State: Most SE tool databases are included in the integrated dataset

The second case, shown in Figure 5, is preferred in that the integrated dataset contains the individual SE tool databases.  Because their schemas are consistent with the master schema and these tools support standards for interoperability, the data in the databases is compatible and can be shared.  This is also preferred because there is only one “ground truth” for the project, the data in the integrated dataset is under strict configuration control and therefore represents the baseline status of the project at any given time.  Any of the “visualizations” of the data will represent the current state of the project.

The second case will most likely mean the organization will need to procure a new SE toolset.  This can be a big expensive and time consuming step for most organizations.  If setting on a path to procure a new SE toolset, it is advisable to choose SE tools that support the generation and management of multiple lifecycle work products and their underlying data and information and especially that support interoperability standards for compatible tools, schemas, and databases.  The perfect case would be to procure a single SE tool that “does it all”, i.e., the one tool would result in having an integrated project dataset by default.  That would help to ensure all data and information is current and consistent across all lifecycle stages.  It is doubtful that such a single SE tool exists.

Note: in the second case as shown in Figure 8, even though most of the SE tools have compatible databases included in the integrated dataset, the organization may still choose to continue to use some legacy systems, such as budgeting and scheduling applications the project is required to use, whose schema is not compatible with the integrated dataset. In this case, this data and information will need to go through an ETL process in order for the data to be useable by other SE tools.

As shown in Figure 6, the end state, the project has a master ontology defined and a master schema defined for their integrated dataset.  All the PM and SE tools used to generate work products create and maintain their data and information in a database that has a schema consistent with the master project dataset schema and conform to interoperability standards.  This allows these PM and SE tool databases to be compatible and to be included within the project’s integrated dataset.

 Figure 6: End State: The project has one integrated dataset –  INCOSE’s Vision 2025 realized.

Work products such as budgets, schedules, requirements, designs, diagrams, drawings, analytical models, etc. and their underlying data and information are created as part of the SE lifecycle process activities.  The data and information representing these work products are either stored and managed electronically in databases or as electronic files which can be linked to other work products. These databases and files are combined into an integrated dataset that represent those work products and underlying data and information.   Guides, standards, policies, and procedures are included in the integrated dataset. The integrated dataset is managed via the enterprise and project data governance, records management, information management, and DB administration requirements and processes. In order for the data and information to be considered the “ground truth” as discussed earlier, the integrated dataset is under strict configuration control.

Once the integrated dataset has been populated, it represents the ground truth concerning the state of the project and becomes the source for subsequent lifecycle activities and resulting work products and underlying data and information. Interoperability standards enable SE tools to share data.  The database management tools allow the project’s SE toolset to access data from the integrated database.

This integrated dataset becomes the foundation of all SE lifecycle activities for the project. This data-centric SE perspective is essential to manage the system development efforts across all lifecycles and address the challenges of increasingly complex systems.

Note: While the concepts of ontology and schema are critical, the details of how they are structured and implemented are beyond the scope of this document, as are the inclusion of examples for different sizes and complexity of projects. These are topics that can be addressed by the appropriate working groups who focus on these areas of interest.


To meet the intent of the MBSE initiative and move towards INCOSE’s Vision 2025, standards must be matured and adopted by the various PM and SE tool and database management system vendors.

This is a major issue that organizations and SE tool vendors need to address. As discussed in the INCOSE SE HB, (INCOSE 2015) section 5.6, information management process, there are several activities in work to develop tool interchange specifications so that ‘models’ and other work products and their underlying data and information can be shared among tools.  The HB states: “The STandard for the Exchange of Product (STEP)—ISO 10303 standard provides a neutral computer‐inter-operable representation of product data throughout the life cycle.

*  ISO 10303‐239 (AP 239), Product Life Cycle Support (PLCS), is an international standard that specifies an information model that defines what information can be exchanged and represented to support a product through life (PLCS, 2013).

* INCOSE is a cosponsor of ISO 10303‐233, Application Protocol: Systems Engineering (2012). AP 233 is used to exchange data between a SysML TM and other SE application and then to applications in the larger life cycle of systems potentially using related ISO STEP data exchange capabilities.”

Another Initiative by tool vendors to develop common schemas for life-cycle data is OSLC – Open Services for Lifecycle Collaboration (”

Developing databases with a common schema would make it much simpler for SE tools to share datasets and integrate these databases into a common, integrated dataset.  This would make the SE tools interoperable by default. Conceptually, if the data is stored per a common standard and schema, the data can be shared between the various SE tools and these same tools can then be used to visualize the data in whatever form is needed by any stakeholder in the organization as shown in Figure 6.

This view is communicated clearly in the Boeing paper mentioned earlier (Malone et al 2016) “A perennial problem restricting data sharing is that modeling tools tend to be created independently, resulting in the tools having different and, often, incompatible data models. To enable data sharing, these separate data models need to be mapped, and a data transfer utility produced to perform intermediate data transformation as the data are passed between the tools. Creating and managing data utilities can easily become more expensive than managing the MBSE environments themselves. Compounding this problem is that data sharing among several tools becomes an ((N)(N-1))/2 scenario as individual data sharing utilities are built between tools.”

“To ameliorate the problems described above, professional associations should strive to publish standard MBSE data models, exchange standards, schema, and accompanying composition/ aggregation/construction rules. Boeing employees participate in standards groups within many of these associations to contribute to this effort. If these standards were supported by the MBSE community and imposed as requirements on the MBSE tool industry, data sharing across MBSE environments would be greatly facilitated. Although tool customization for specialized scenarios will, most likely, always be required, it would be beneficial if this customization were performed around a common data model core.”

“It would be appreciated if industry delivered MBSE platforms that feature a suite of tools incorporating: a robust, flexible hub that provides multiple, industry standard, data creation and manipulation views of the system architecture models; a common data model embedded across the tool suite; facile tool integration; and, straightforward data exchange utilities. No tool suite has been identified that provides a sufficient number of these features.”

Whether the SE tools being used by various organizational elements comply with a standard, an approach needs to be defined to integrate the different datasets into a common, integrated dataset.  Fortunately, the INCOSE Tool Integration and Model Lifecycle Management Working Group (TIMLM WG) is working to make this happen.  Their mission is to capture best practices and guidelines for using computer based tools, exchanging data between tools, and allowing users to operate on this data.

Once the establishment of a common, integrated dataset becomes a reality, ideally, SE tools would be able to use this data to develop, display, and manage the various SE lifecycle work products and their underlying data and information.  Done properly, all the benefits of SE from a data-centric perspective stated at the beginning of this document can be realized.

The infrastructure identified in Figure 9 and the processes to perform the ETL functions, need to be enabled by the enterprise, business management, and business operations levels.  With this infrastructure in place, the program/project can then define their unique needs in their PMP, SEMP, and IMP.

In part 5 of this blog series, I discuss topics to help organizations develop a systems engineering capability that meets the needs of their organization.

In the first section of part 5, I introduce the concept of SE capability levels (SCLs) to help organizations assess what their current SE capability is from an integrated dataset perspective and provide a roadmap to get to their desired level of SE capability based on their organization’s specific needs.

In the second section of part 5, I provide advice to help sell the idea of moving toward a data-centric practice of SE to management.

Comments are welcome.

If you have any other topics you would like addressed in our blog, feel free to let us know via our “Ask the Experts” page and we will do our best to provide a timely response.

Tags: , , , , , , ,
Posted in Systems Engineering | Tagged , , , , , , , | Leave a comment

Leave a reply