Data Generator

Overview

The Data Generator in my8data enables the creation of synthetic test data for training, demonstration, and testing purposes. Instead of having to use real measurement data, you can use the data generator to systematically generate datasets with specific statistical properties.

Applications

Application	Description
Training and Education	Create exercise data with known properties to practice interpreting MFU, SPC, and process capability
Demonstration	Show customers or colleagues the capabilities of my8data with realistic but anonymized data
Module Testing	Verify the correct function of analysis modules with data whose expected results are known
Method Comparison	Compare different statistical methods using identical datasets
Sensitivity Analysis	Investigate how changes to parameters (e.g., variation, mean) affect process capability indices

Info: The generated data is created using a random generator and is suitable for statistical analysis. It does not represent real measurement values and should not be used for productive quality decisions.

Quick Start

Generate a test dataset in just a few steps:

Open the Data Generator from the main menu
Select the desired distribution type (e.g., normal distribution)
Set the parameters (mean, standard deviation, sample size)
Click Generate
Import the generated data directly into an analysis module or export it

Tip: To test the impact of various scenarios, generate multiple datasets with different parameters. For example, you can simulate a capable process (Cpk >= 1.67) and an incapable process (Cpk < 1.00) and compare the results.

Parameters and Configuration

Distribution Types

The data generator supports various distribution types to simulate different process situations:

Distribution Type	Description	Typical Application
Normal Distribution	Symmetric bell curve; most common in practice	Standard case for most manufacturing processes
Uniform Distribution	All values in range equally probable	Simulation of a process without clear central tendency
Log-Normal Distribution	Right-skewed; only positive values	Roughness, particle sizes, failure times
Weibull Distribution	Flexible shape; can be skewed or symmetric	Lifetime analysis, reliability

Configurable Parameters

Normal Distribution

Parameter	Description	Example Value	Effect
Mean (μ)	Center of the distribution	10.00	Shifts the entire distribution left or right
Standard Deviation (σ)	Width of the distribution	0.02	Larger values create wider spread
Sample Size (n)	Number of values to be generated	100	More values increase statistical significance

Uniform Distribution

Parameter	Description	Example Value
Minimum	Lower limit of the value range	9.90
Maximum	Upper limit of the value range	10.10
Sample Size (n)	Number of values to be generated	100

Log-Normal Distribution

Parameter	Description	Example Value
μ (log)	Mean of the logarithmized characteristic	2.30
σ (log)	Standard deviation of the logarithmized characteristic	0.10
Sample Size (n)	Number of values to be generated	100

Weibull Distribution

Parameter	Description	Example Value
Shape Parameter (k)	Determines the shape of the distribution	3.5
Scale Parameter (λ)	Characteristic lifetime / scaling	10.00
Sample Size (n)	Number of values to be generated	100

Info: For the Weibull distribution, a shape parameter k < 1 produces a decreasing distribution (early failures), k = 1 corresponds to an exponential distribution (random failures), and k > 3 yields an approximately bell-shaped distribution (wear-out failures).

Advanced Options

In addition to the basic parameters, you can make the following advanced settings:

Option	Description	Default Value
Seed	Starting value for the random generator; enables reproducible results	Random
Decimal Places	Number of decimal places	3
Add Outliers	Deliberately add outliers to the dataset	Disabled
Number of Outliers	How many outliers to insert	0
Outlier Range	Range in which the outliers should lie	±4σ to ±6σ

Tip: Use the Seed function when you need reproducible results. With the same seed and identical parameters, you will always get the same dataset. This is particularly useful for training where all participants should work with the same data.

Practical Examples

Example 1: Simulate a Capable Process

Goal: Generate a dataset with Cpk >= 1.67

Parameter	Value	Rationale
Distribution	Normal Distribution	Standard case
Mean	10.000	Centered on target value
Standard Deviation	0.010	Low variation
Sample Size	100	Sufficient for Ppk calculation
USL	10.050	Tolerance width = 0.100 mm
LSL	9.950

Expected Result: Cp ≈ Cpk ≈ 1.67 (Tolerance width 0.100 / 6 * 0.010 = 1.67)

Example 2: Simulate a Decentered Process

Goal: Generate a dataset with good Cp but poor Cpk

Parameter	Value	Rationale
Distribution	Normal Distribution	Standard case
Mean	10.025	Deliberately off target
Standard Deviation	0.010	Same variation as Example 1
Sample Size	100
USL	10.050
LSL	9.950

Expected Result: Cp ≈ 1.67, but Cpk ≈ 0.83 (Process variation is narrow but decentered)

Warning: Synthetically generated data follows the selected distribution exactly. Real process data in practice often deviates from ideal distributions. Results achieved with generated data are therefore often "cleaner" than real analyses.

Data Generator

Data Generator

Overview

Applications

Quick Start

Parameters and Configuration

Distribution Types

Configurable Parameters

Normal Distribution

Uniform Distribution

Log-Normal Distribution

Weibull Distribution

Advanced Options

Practical Examples

Example 1: Simulate a Capable Process

Example 2: Simulate a Decentered Process

Jetzt selbst ausprobieren