Data Generator

Generate Synthetic Data

Markdown

Data Generator

Overview

The Data Generator in my8data enables the creation of synthetic test data for training, demonstration, and testing purposes. Instead of having to use real measurement data, you can use the data generator to systematically generate datasets with specific statistical properties.

Data Generator Overview

Applications

Application Description
Training and Education Create exercise data with known properties to practice interpreting MFU, SPC, and process capability
Demonstration Show customers or colleagues the capabilities of my8data with realistic but anonymized data
Module Testing Verify the correct function of analysis modules with data whose expected results are known
Method Comparison Compare different statistical methods using identical datasets
Sensitivity Analysis Investigate how changes to parameters (e.g., variation, mean) affect process capability indices

Info: The generated data is created using a random generator and is suitable for statistical analysis. It does not represent real measurement values and should not be used for productive quality decisions.

Quick Start

Generate a test dataset in just a few steps:

  1. Open the Data Generator from the main menu
  2. Select the desired distribution type (e.g., normal distribution)
  3. Set the parameters (mean, standard deviation, sample size)
  4. Click Generate
  5. Import the generated data directly into an analysis module or export it

Tip: To test the impact of various scenarios, generate multiple datasets with different parameters. For example, you can simulate a capable process (Cpk >= 1.67) and an incapable process (Cpk < 1.00) and compare the results.


Parameters and Configuration

Distribution Types

The data generator supports various distribution types to simulate different process situations:

Distribution Type Description Typical Application
Normal Distribution Symmetric bell curve; most common in practice Standard case for most manufacturing processes
Uniform Distribution All values in range equally probable Simulation of a process without clear central tendency
Log-Normal Distribution Right-skewed; only positive values Roughness, particle sizes, failure times
Weibull Distribution Flexible shape; can be skewed or symmetric Lifetime analysis, reliability

Distribution Types

Configurable Parameters

Normal Distribution

Parameter Description Example Value Effect
Mean (μ) Center of the distribution 10.00 Shifts the entire distribution left or right
Standard Deviation (σ) Width of the distribution 0.02 Larger values create wider spread
Sample Size (n) Number of values to be generated 100 More values increase statistical significance

Uniform Distribution

Parameter Description Example Value
Minimum Lower limit of the value range 9.90
Maximum Upper limit of the value range 10.10
Sample Size (n) Number of values to be generated 100

Log-Normal Distribution

Parameter Description Example Value
μ (log) Mean of the logarithmized characteristic 2.30
σ (log) Standard deviation of the logarithmized characteristic 0.10
Sample Size (n) Number of values to be generated 100

Weibull Distribution

Parameter Description Example Value
Shape Parameter (k) Determines the shape of the distribution 3.5
Scale Parameter (λ) Characteristic lifetime / scaling 10.00
Sample Size (n) Number of values to be generated 100

Info: For the Weibull distribution, a shape parameter k < 1 produces a decreasing distribution (early failures), k = 1 corresponds to an exponential distribution (random failures), and k > 3 yields an approximately bell-shaped distribution (wear-out failures).

Advanced Options

In addition to the basic parameters, you can make the following advanced settings:

Option Description Default Value
Seed Starting value for the random generator; enables reproducible results Random
Decimal Places Number of decimal places 3
Add Outliers Deliberately add outliers to the dataset Disabled
Number of Outliers How many outliers to insert 0
Outlier Range Range in which the outliers should lie ±4σ to ±6σ

Tip: Use the Seed function when you need reproducible results. With the same seed and identical parameters, you will always get the same dataset. This is particularly useful for training where all participants should work with the same data.

Practical Examples

Example 1: Simulate a Capable Process

Goal: Generate a dataset with Cpk >= 1.67

Parameter Value Rationale
Distribution Normal Distribution Standard case
Mean 10.000 Centered on target value
Standard Deviation 0.010 Low variation
Sample Size 100 Sufficient for Ppk calculation
USL 10.050 Tolerance width = 0.100 mm
LSL 9.950

Expected Result: Cp ≈ Cpk ≈ 1.67 (Tolerance width 0.100 / 6 * 0.010 = 1.67)

Example 2: Simulate a Decentered Process

Goal: Generate a dataset with good Cp but poor Cpk

Parameter Value Rationale
Distribution Normal Distribution Standard case
Mean 10.025 Deliberately off target
Standard Deviation 0.010 Same variation as Example 1
Sample Size 100
USL 10.050
LSL 9.950

Expected Result: Cp ≈ 1.67, but Cpk ≈ 0.83 (Process variation is narrow but decentered)

Warning: Synthetically generated data follows the selected distribution exactly. Real process data in practice often deviates from ideal distributions. Results achieved with generated data are therefore often "cleaner" than real analyses.

Jetzt selbst ausprobieren

Create your own MSA, SPC and capability analyses with my8data — the web-based platform for quality management.

Register now