Generating set of data

In: Uncategorized

15 Nov 2017

I just released a little project in Github that provides an easy way to generate data sample and push them to something like a AWS Kinesis stream. This is quite handy if you need for example to build a POC or a demo and require some data set.

Here is the link to the github project : https://github.com/alfallouji/ALFAL-AWSBOOTCAMP-DATAGEN

Or to make it easier, you can click here to deploy it on AWS.

Please keep in mind the following. This code is provided free of charge. If you decide to deploy this on AWS (using the cloudformation script), you may incur charges related to the resources you are using in AWS (e.g. EC2, S3, Kinesis, etc.).

The structure of the generated data can be defined within a configuration file.

The following features are supported :

  • Random integer (within a min-max range)
  • Random element from a list
  • Random element from a weighted list (e.g. ‘elem1′ => 20% of chance, ‘elem2′ => 40% of chance, etc.)
  • Constant
  • Timestamp / Date
  • Counter (increment & decrement)
  • Mathematical expression using previously defined fields {{field1} + {field2} / 4) * {field3})
  • Conditional rules : {field3} equals TRUE if {field1} + {field2} < 1000 {field4} equals FALSE if {field1} + {field2} >= 1000
  • Any of the feature exposed by fzaninotto/faker library
  • Ability to defined the overall distribution (e.g I want 20% of my population to have a value of ‘Y’ for {field3}). The generator will run until it meets the desired distribution.

Here is an example of a configuration file :

// Define the desired distribution (optional)
'distribution' => array(
    // We want to have 30% of our distribution with a value of 'Y'
    // for the result field and 70% with a value of 'N'
    'result' => array(
        'Y' => 3,
        'N' => 7,
    ),
),

// Define the desired fields (mandatory)
'fields' => array(

    // You can use date function and provide the desired format
    'time' => array(
        'type' => 'date',
        'format' => 'Y-m-d H:i:s',
    ),

    // Randomly pick an integer number between 10 and 100
    'field1' => array(
        'type' => 'randomNumber',
        'randomNumber' => array(
            'max' => '100',
            'min' => '10',
        ),
    ),

    // Field2 is a constant equalts to 1000 (could be any string)
    'field2' => array(
        'type' => 'constant',
        'constant' => '1000',
    ),

    // Randomly pick an element from a defined list of values
    'field3' => array(
        'type' => 'randomList',
        'randomList' => array(
            'us',
            'europe',
            'asia',
        ),
    ),

    // Pick an element from a weighted list
    'field4' => array(
        'type' => 'weightedList',
        'weightedList' => array(
            'men' => 40,
            'women' => 60,
        ),
    ),

    // You can use mathematical expression
    'field5' => array(
        'type' => 'mathExpression',
        // Value => condition
        'mathExpression' => '{field1} + {field2} + sin({field2}) * 10',

    // You can use any of the faker feature
    'field6' => array(
        'type' => 'faker',
        'property' => 'name',
    ),

    'field7' => array(
        'type' => 'faker',
        'property' => 'email',
    ),

    'field8' => array(
        'type' => 'faker',
        'property' => 'ipv4',
    ),

    // You can define conditonnal rules to be evaluated in order to get the value
    // if this condition is true :
    // {field1} + {field2} > 1060, then the value for {result} is 'Y'
    'result' => array(
        'type' => 'rules',
        // Value => condition
        'rules' => array(
            'Y' => '{field1} + {field2} > 1060',
            'N' => '{field1} + {field2} <= 1060',
        ),
    ),
),
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks

Comment Form

Who am I?

My name is Bashar Al-Fallouji, I work as a Enterprise Solutions Architect at Amazon Web Services (Sydney, Australia).

I am particularly interested in Cloud Computing, Web applications, Open Source Development, Software Engineering, Information Architecture, Unit Testing, XP/Agile development, etc.

On this blog, you will find mostly technical articles and thoughts around PHP, OOP, OOD, Unit Testing, etc. I am also sharing a few open source tools and scripts.

  • dipan: Hi Bashar It's really awesome that you wrote this code. IT'll save tones of time of all developer. [...]
  • Bashar: Glad that you liked it ! [...]
  • Angel S. Moreno: well, there goes wasting a couple of hours of development and a couple of days of testing. I owe you [...]
  • Bashar: Thats right, the setSaveFile create a files containing an associative array of classname => filen [...]
  • Loggy: Jim's clarification in particular was pretty useful although I did have to dig down into the tree to [...]