ID.ai User Manual

Version 1.0 : 23/09/2024

Introduction

This User Manual (UM) provides the information necessary for analysts to effectively use the ID.ai platform for building – and developing documentation – statistical models needed for data driven business decision making.

Corestrat’s Model.ai offers model-building intelligence by removing the complexity of developing a predictive model without you having to write a single line of code. AI embedded Model.ai builds predictive models employing multiple ML techniques in quick seconds for the uploaded data and for your target goals

The no-code, enterprise-ready platform that allows you to build and deploy classification and/or regression predictive models in a few clicks. Model.ai helps enterprise users to get closer to smarter business decisions by capturing actionable and hidden patterns in the data.

Model.ai automates a large part of repetitive machine learning steps to ease the tasks of data scientists & non data scientists, thereby enabling enterprises swiftly adopting ML solutions and allowing them to focus on more complex issues.

Software Requirements Specifications

This section provides information about the minimum hardware and software configurations for installing the ID.ai software.

Operating System	Windows 10 & Higher
Ram	>=16GB
Disk Space	>=200GB
Software	MS Word 16 & Higher

Any configuration below the acceptable versions/configurations described above will result in installation and/or performance issues of ID.ai.

Installation of ID.ai

This section provides the steps to install the ID.ai application on the user’s laptop.

On purchase, Corestrat will provide an exe file named “ID.ai” exe file which looks like the following image.

User should click on this exe file and will get the pop-up screen shown below requesting the user to proceed with the subsequent installation steps. Click on “Install”. User should right click and run as administrator in case privileges are insufficient for direct installation.

The user needs to click on “Finish” once installation is complete.

The user will get a shortcut to open the ID.ai application on the desktop.

Click on the ID.ai icon which is shown below.

Getting Started

On opening the application user must input username and license key which would have been provided at the time of purchase. Please enter these in the relevant boxes in the opening screen shown below to proceed further.

There are six stages in building the desired outcome to drive data driven decision making at scale; these stages are illustrated in sequence in the picture below.

The various steps in the six stages will be described in the subsequent sections.

Upload Data Section

Project Creation & File Input

For building a new project, click on “New Project” and assign the project name in the pop-up screen as shown below.

Once user enters the name of the new project, (s)he can either just save it and work it it later or start working on it immediately using the respective options in the pop-up as shown below.

In the subsequent screen, the “project_name” and “project path” are displayed at top left side of screen. For starting a new project, the user must point to the folder location and file name of the input data source.

ID.ai supports data upload in any of the following formats:

CSV
Excel
Feather
Parquet
Flat text files with delimiters

The minimum number of records needed is 200 with at least 5 columns and a minimum target variable count of 10.

If an invalid file is used for input, the following error message pops up.

User needs to click on “Close” in the pop-up and go back and upload a dataset in one of the correct formats.

Import from Databricks

B) ID.ai supports data imports from Databricks.

1. Click on Import Data from DB.

2. Add Credentials and Click on Test Connection.

3. Write your SQL query to get and view the data. Click Next, once you are content with dataset and want to use it for model building .

User needs to click on “Close” in the pop-up and go back and upload a dataset in one of the correct formats.

Existing Projects

1. The list of recent projects worked by the user earlier is provided; user also has the option to search the project names

2. For importing a project from a different location user must

a. Select the Import Existing Project option.
b. Browse to the location where your project is saved.
c. Choose the project folder you wish to import
d. Confirm the import, and the project will be added here for you to view.

3. Once user clicks on “Launch Project” then import project is successfully loaded

Input Data Management

Once the input file has been uploaded, a summary records count will be displayed along with the first 1000 rows’ actual data. Do check that the row and column counts match with the actual input. The screen also provides information on empty and duplicate rows & columns.

The user has the following options to customize the inputs.

1. Remove duplicates and keep only unique records

2. Consider or ignore rows & columns with NaN (“not-a-number”) values

3. Provide a list of potential columns to exclude (e.g. phone number, PIN code) for faster computation.

These options can be selected/unselected by using the check box(es) under the “Rows and Columns” heading in the left of the screen as shown in the following screenshot.

ID.ai offers the following features:

1. Convert numerical columns to categorical columns and vice versa.

2. Provides a list of columns having fewer than 20 distinct variable values as likely candidates for treatment as categorical

3. Indicates some likely candidates for dropping from model build since they are likely to have no predictive power.

These options are highlighted on the right side of the screen as shown below.

In case the user already has a dataset where the target variables have already been defined and would like to use this file for model build, (s)he can upload this using the “Add Meta Data” option in the top right portion of the screen. This approach already has the ignore variables option already populated with likely exclusion candidates.

The procedure to use this file is similar to the normal approach described earlier – using either “Drag and Drop” or “Browse File” options as shown below.

Once use clicks on “Show sample” a small pop-up displays the likely candidates for the role categories as shown below.

If user would like to view the above in Excel format, (s)he needs to click on “Download Sample” and the same will be displayed in Excel format as shown below.

Once the Meta file is uploaded, the key characteristics are displayed on screen as shown below.

Once the user applies the relevant changes by clicking on “Apply Changes”, the summary of these is displayed.

Feature Engineering:

Two-Way Interaction: A two-way interaction shows how the effect of one variable on an outcome changes depending on the level of another variable.

The user selects variables from the ‘List of Categorical Variables’ and clicks the ‘>’ symbol to move them to the ‘Selected Variables’ list.

The user selects variables from the ‘Selected Variables’ and clicks the ‘Apply’ button to move them to the “New Created Variable” list.

After clicking on “Next” button to move them to the “Code-it Yourself” section.

Code-It-Yourself : Perform feature engineering by writing your own custom python code.

Simply input your code to create or modify features based on your data.

This gives you full flexibility to tailor features to your specific needs.

Variable List : Hover over each variable to check the variable type. Double click on the variable to populate it in the code box immediately.

After click on “Compile” and “Execute” button then new variable name is populated in List of Engineering Features Table.

Once the user has selected and customized the input variables, key summary statistics for these are provided in the next screen:

1. For numerical variables the total record count, count of records with missing values, mean, median, skewness and kurtosis values are provided.

2. For categorical variables the total record count, count of records with missing values, the value and count of the maximum occurring category within each categorical variable is provided.

To get a visual representation of any variable’s distribution, the user can click on the bar chart icon under the histogram column (blue oval in the screenshot above). The resultant chart provides the lower and higher bound values, the 25^th/50^th/75^th percentile values as well as any outliers. The range is also split into 10 bins by default and the count and share in each bin is provided.

Variable Transformation:

By clicking on ‘Transform,’ a popup screen opens displaying Box-Whisker and Histogram charts.

By choosing Transformation Type then creating new box-whisker and Histogram charts

Here by click on “yes” to retain variable other wise “no” to not retain variable. it will saved by default “Transformed Variable name” or custom variable name then to click on “save” button.

Feature Engneered Numerical Variables,there is an option for transformation of variable

Feature Engneering Categorical Variables

Select Variables Section

Target Variable Selection

The first step is to select the variable that represents the outcome being modelled. ID.ai provides the following:

1. List of all variables in the dataset which have fewer than 10 distinct values called “candidate target variables”.

2. User can select one value from this list by moving it from the list on the left to the right using the “>” arrow.

3. To reverse the selection the user can move variables from the right to the left using the “<” arrow. (both this are indicated in the blue oval highlighted portion in the screenshot)

Once the target variable is selected, the next screen takes the user to the “Define target categories”. This provides a list of all the distinct values within the selected target variable. The user needs to select which among these distinct values will be considered as desired outcomes and which do not.

Stratified Sampling

The next step is to split the input dataset into the “train” and “test” samples. The model will be built on the “train” sample and the results will be applied on the “test” sample to check the integrity of the model build. The default is set at 70/30 split between train and test respectively, although the user has the option to customize the split.

Independent Variables Exclusion

The user can – based on business context – decide to remove one or more independent variables from consideration while building the model. The “variables to be ignored” tab provides the list of all independent variables and user can exclude specific ones by using the “>” arrow to move these from the left to the right.

Independent Variable Insights

Users can combine multiple values of each categorical variable into custom value groups based on similar predictive power or business context. The next few steps indicate the procedures for the same.

Categorical Variables

1. In the tab named “Target Rate Insights by Variables” the a). count , b.) correlation between the target variable and the independent variable c.) information value and d.) a histogram of this independent variable’s splits and bad rate is provided.
2. If the user needs to combine some of the categories within this independent variable to get fewer categories, user needs to click on “Perform Manual Binning”
3. This leads to the next screen where “student” and “premier” categories are assigned the same value of 2(using the “Key in your splits” column) indicating they are both combined into this single category whereas “regular” category is retained as a separate category with a value of 1.
4. The resultant histogram providing the results from this modification is also generated.

Once the user is comfortable with this modification, (s)he needs to click on “submit” to save these changes.

Numerical Variables

For numerical variables similar metrics are provided as in categorical variables; the main difference being that three default segments are used as splits for the independent variable.

User can change the bin sizes and thresholds by using values for each bin’s upper and lower bound values using the “Key in Splits by Comma” box.

Once the user has analysed the results of the custom splits, user can “Confirm and Submit” the changes made.

Information Value

Information Value is a measure of the predictive power of the independent variable on the target variable. The IV value thresholds for suspicious, strong, moderate, weak are >2.0, 0.5 to 2.0, 0.1 to 0.5, 0.02 to 0.1 and < 0.02 respectively. These are provided in a bar chart representation. By hovering on any bar within the chart, the specific variable’s IV values can be seen.

Clustering :

This screen allows users to configure the VarClus Clustering Algorithm:

1. Number of Clusters or Variance Retention:
• Specify the desired number of clusters or the proportion of variance to retain.
• The algorithm will stop splitting clusters once either condition is met.
2. IV Threshold:
• Define an Information Value (IV) threshold. Variables falling outside this threshold will be excluded from the clustering process.
3. Clustering Method:
• Choose between clustering on the Weight of Evidence (WoE) of binned variables or on the original values of the variables.

Once the algorithm is executed, this screen displays:

1. Cluster Summary Table (Middle):
• Displays the number of clusters formed and the proportion of variance explained by the first principal component (PC1) of each cluster.
• Users can click on any cluster to manually select or change the variable that represents it.

2. Final Variable Table (Table 2):
• Lists the final set of variables selected for model building. All other variables will be discarded if the user saves the clustering results.

3. Summary Text Box (Right):
• Shows the proportion of variance explained by the final set of variables across the overall dataset.

Click “Save” button, to save the clustering results and to proceed with the final variable set.
Click “Skip” button to discard the clustering results and to retain the original variable set.

Correlation and multicollinearity

This tab provides the correlation values between the target variable and all the independent variables. Using the measure of Variance Inflation Factor (VIF) the variables that fall into very high (VIF > 10), high (VIF between 5 & 10), moderate (VIF between 1.5 & 5) and low (VIF < 1.5) are listed.

The buttons “Top 10 variables most with target” and “Highly variable pairs” provide the respective information in pop-ups as shown below.

Model Comparison: Summary

– User can create upto 3 model max and compare them .

– Perform Ensembling, Compare Models with KS & Gini and Select Final Model to be used.

Train a Model Section

Model Settings and Root Node

Once all the input data has been finalized, the next stage is to build the model. The first screen provides the record count, target count and target rate for the train and test samples. This root node will always be called as Node “ID 0”.

Before starting a model build, User has the option to customize the following parameters by clicking on the settings symbol on the top right corner of the screen (highlighted using a small blue circle above):

1. Global parameters: node size, maximum split levels, pairwise correlation limits, VIF and IV
2. Score scaling: Base Score, Base Odds and PDO (Probability of double odds)
3. Decision tree parameters: Minimum cases (#) or targets (# or %) in a node
4. Logistic model parameters: p-value limit
5. Random Forest and XGBoost parameters

Once the user has input the desired settings, these are saved by clicking on the “Update” and “Save Settings” button on the respective pop-up boxes.

Once the settings are finalized, the user can click on root node to get multiple options to either grow the decision tree (“Auto Grow”) or build an AI based model (“Run Model.ai”). In both these cases, the user does not have any control over the resulting tree and segmentation.

Auto Grow

When the “Auto Grow” option is selected, a tree is developed starting from the root node and progressing to subsequent levels based on robust separation within each level and across different levels.

By Using ‘+’ button we will get new screen of root node of tree, here we can run different operations based on our requirement

Summary of all models look like below screen.

Based on user’s final selection, that selected model or by default Model 1 will be selected. Then for further steps, click on “Evaluate Your model”.

If user does not wish to split a particular node into sub-classes, they can go back to that node and click on “collapse node” option.

To insert an additional split within a node, user can click on “Add you splits” within that node. User gets the options to choose any variable to be split then click on “split node”; the IV values are provided to make an informed decision. Do note that the variables with the highest IVs are placed at the top.

Once the automatically generated splits are available, user can generate custom splits within this variable by clicking on “add your splits”. This leads to a dialog box where user can assign the variable ranges in integer-based groups.

Another way to split is to “Specify the split point by comma-separated” values in the right of the screen. Press “Submit” after that. Click on Expand Node.

Run Model.ai

In this option, the user just clicks on “Run Model.ai” button, select the model type and the results are automatically generated. User does not have the option to split or collapse these nodes.

User can click on “Click to view model.ai output” to see the following results for the node on which model.ai was run:

1. Node details, variable importance, model performance metrics & target rate by score.

2. Logistic regression technical details

3. Scorecard details for that node

4. User can click on “click to see the graph” in the model performance metrics tab to get a visual representation of KS and Gini values for the train and test samples.

1. Random Forest Model Tree

2. Random Forest provides a bar graph of variable importance and a SHAP chart, which can be viewed by clicking on ‘Click to view SHAP Chart.’

3. User can click on “click to see the graph” in the model performance metrics tab to get a visual representation of KS and Gini values for the train and test samples.

1.XGBoost model Tree

2. XGBoost provides a bar graph of variable importance and a SHAP chart, which can be viewed by clicking on ‘Click to view SHAP Chart.’ It also provides KS and GINI charts.

3. User can click on “click to see the graph” in the model performance metrics tab to get a visual representation of KS and Gini values for the train and test samples.

Evaluate Your Model

Once the model has been built, the next stage involves analysing the performance of the mode. This section provides a description of the various performance metrics available in ID.ai to evaluate the model. The headings in the bullets refer to the buttons within the tool.

Model summary by Node – information on all nodes used in the model, the counts and bad rates in both Train and Test samples.

• Model summary by score – counts and bad rates for each score bin and raw score for both Train and Test samples.
• KS & Gini chart – visual representation of KS and Gini values for both Train and Test samples

• Model scorecard – this is used to scale the scores for the various nodes based on predefined base score values, odds and base score and probability of double odds.

– User can either see the scores generated for the leaf nodes by clicking on the “Score card for Leaf Nodes” button
OR
– Scores for the AI Logistic Regression Model node by clicking on the “Score card for Node ID 04” button

– To view shapely values for the AI Random Forest or XGBoost model, click on the “Shapely values for Node ID 04” button

• Deployment : When user clicks on “Click to deploy”, they will get “API Endpoint” along with sample payload and sample response. They can run “API Endpoint” to get Json response.

Build Your Decision

Once the scorecard has been developed, the user can upload an OOT (out-of-time) dataset to simulate the decision based on the previously developed model. Simulation(s) can be done on the overall score or a particular sub-segment.

The first step is to upload a fresh OOT/unseen dataset which should contain ALL the variables in the model build stage. User needs to upload this file from the location to the tool as shown in the screenshot below. (The file specifications are the same as described earlier in Section 4.1.1)

If the OOT/unseen dataset uploaded does not contain ALL the columns from the model that was built and selected, it would display an error message as shown below. Kindly upload an appropriate dataset on which to build a decision using the model built in Step 3.

Data can also be added using Databricks. User needs to follow the below steps:

1. Enter the Host URL, Access Token and the Http Path in the allocated fields.

2. Click on Test Connection

3. Once it shows “Connection successful”, click on “Connect”.

4. User can choose the table from the left side from the database of need (refer image).
5. Write a custom SQL Query in the editor provided to get data from the table based on user criteria. Once done, click on “Run Query” (refer image). This is an optional step.
6. The source of the data shown can be changed through the source dropdown.
7. Once satisfied with the data extracted, click on “Next” button.

8. The below shown dialog box shall appear. If want to proceed further with this selection, click “Yes”, else click on “Cancel”.

If you want to terminate the connection, click on the “Back” button as shown in the image and click on “Yes Terminate Connection” when prompted.

The next screen provides the population stability and characteristic stability.

The Population Stability Index (PSI) measures how much the data used to train the model differs from new, unseen data. It helps check if the model’s predictions are still reliable over time. A high PSI means the data has changed a lot, which could affect the model’s performance.

User has the option to choose a distribution option between Overall, Decision Tree Leaf Nodes and the Specific Node using the drop-down menu in “Choose Distribution Option”.

The Characteristic Stability Index (CSI) measures how much a variable’s behavior/distribution has changed between the data used to train the model and new, unseen data. It helps check if the patterns the model learned are still consistent or if they’ve shifted over time. A high CSI means there’s a big difference, which could affect the model’s performance.

The equivalent drop-down menu option for characteristic stability is available only when Model.ai is run.

Option A – Define Actual Target

“Define Actual Target” button in Population stability section This approach enables the original model build dataset’s target variable to be used as the target variable in the newly uploaded dataset also.

1. Click on “Define Actual Target” button located at top right of the screen
2. Click on “Choose the variable containing actual performance” to select the target variable OR Goto Step 5
3. Then select the checkbox in “List of Distinct Values” to specify which values within the target variables should be considered as the target value and shift to “Values to be treated Target” section using “>” button
4. Then click on “Apply Changes” for validation
5. Click on “Without Actual Target” (refer image, blue rectangle)

In this approach, the OOT sample dataset does not have the target variable. The target is predicted by applying the model on the OOT dataset and predicting the target variable for each record based on the values of the independent variables.

Option B – Reject Inferencing

The next screen provides Reject Inferencing. It is a method for improving the quality of a model by incorporating data from previously rejected/unavailable records. Bias can result if a credit scorecard model is built only on accepts and does not account for applications rejected that were rejected and hence have an unknown target status.

• If you want to proceed to next screen, click “Without Reject Inferencing” button (refer Image, blue rectangle).

• Steps to perform:

1. Click on “Choose the variable containing past decisions” and select the appropriate variable
2. After that, select the value from “List of Distinct Values” which should be considered as approved and move to the right side using the ‘>’ button to ‘Values to be treated as pass decision = “Approved”’ section (refer image, blue oval)
3. You can modify the value of ‘Reject Inferencing Factor’ to an integer between 1 and 10 based on your specific needs. Default is kept at 3.
4. Click on “Apply Changes” button to perform Reject Inferencing.

It would display the output as represented in below image. Click on the “Back” button on bottom left corner to return to “Reject Inferencing” screen.

Decision Simulator

Create Metric:

– User can select the metrics from the list of “All Variables/Metrics” from their uploaded dataset (refer image, blue oval)
AND
– User can click on “+ Create new metric” to create new metrics of their own (refer image, blue rectangle)
OR
– Click on “Next” button to skip this and proceed further.

“+ Create new metric”

– Drag the operator into the canvas that is to be performed
– Can search for the name of the feature that is wanted for operating on from the “Search Variable” search box (top-left) Note* : Only numerical features from the dataset will be visible here
– Select the variable from the list on the left of the screen and drag onto the appropriate side of the operator. Numerical values can be used too along with the features available (as shown in the image)
– Give a distinct name to the variable which doesn’t conflict with existing variables from the dataset or the newly created metric.
– Click on “Add new calculation” to create another metric.
– The equation can be discarded by clicking on the small trash icon located beside the new metric name.
– Click on “Save Equations” to save the newly created metrics for further use.

Cut-off Decision Overall

This tab provides a summary of the scorecard performance when different cut-off thresholds are selected. This overall approach does not allow segmented cut-offs, which will be described in the next section.

This screen allows the user to view the score cut-offs and their impact on two pre-defined outcomes when compared to existing scenario:
1. Maintaining same target rate
2. Maintaining same approval rate

Users can also choose actual score point values to analyse the impact of this on target and approval rates.
The impact of the above selections can be seen in terms of:
1. Accept/Decline share before and after cut-off selection
2. Bad rates of Accept/Decline before and after cut-off selection

Impact can be seen graphically from the bar chart to view the population percentage and the metric
New Metrics created or added can be selected from “Choose Metric” dropdown to view their impact on the cut-off.

Cut-off Decisions – Segmented

This screen displays the total population or target that belongs to individual score bins. This option enables the user to set tailored cut-off decisions for specific segments by selecting a segmentation variable for individual score bins. This allows for decision making based on distinct groups within the chosen segmentation variable. To customise the score bins, user should input custom score values that they want to create bins for and click “Apply” (refer below image). This step can however be skipped and can proceed further with only the selection of segmentation variable.

The following screenshot shows the distribution of all the cases by score bin.

Let us assume that the user would like to have a segmented cut-off decision using a variable named “APP_PROD_CODE”. In this scenario, the user selects this variable in the Drop-down menu named “Choose segmented variable” available on the right side of the table (refer above image).

The count of records within each score bin split into the different categories of the APP_PROD_CODE is displayed here.

Option 1:

– ID.ai provides with pre-analysed recommendation based on the model built on this unseen dataset for the “To-Be Accept” and “To-Be Decline” in shade of green and red respectively. By default, the Predicted Target Rate (PTR) is taken to be 4.30%.

– User can modify this selection by clicking to deselect and select again based on their purpose.

– Click on “Apply” button to view the “Cut-off Impact” of the new segmentation.

– The results of this decision will be displayed including the population decline/accept counts and the bad rates.

– User can do multiple iterations of the accept/decline combinations and every time the impacts will be available.

– Once satisfied with the segmented cut-off decision, user can click on “Save Decision” to store this decision.

The > option can be used to toggle between expanded and compact view.

Option 2:

This option is to combine multiple categories of the segmented variable.

1. Click on “Click to edit” under Range/Binning.
2. In the ensuing pop-up screen assign the same integer values to those categories of the segmented variable that need to be combined.
3. Click on “Add Binning”

1. In the subsequent screen, click only on the specific boxes which will be accepted.
2. Once user clicks on “Apply” after doing the above, the results of this decision will be displayed including the population decline/accept counts and the bad rates.
3. User can do multiple iterations of the accept/decline combinations and every time the impacts will be available.
4. The impact can be viewed for different binning/segmentation created by changing “Choose Segment”.
5. Once satisfied with the segmented cut-off decision, user can click on “Save Decision” to store this decision.

Auto Documentation

Once the model-build and cut-off decisions have been completed, the final stage is the generation of the “Decision Tree Technical Document” which provides a comprehensive document that will be invaluable for audit trail purposes. The user just needs to click on “Generate document” and a MS-Word version of the document will be available in 2-3 minutes.

While almost the entire document is pre-filled with the relevant information, the following three sections within it need to be filled by the user before dissemination.

Executive Summary
Data Sources and Sampling
Decision Tree Fact Sheet

Since these sections involve the business context and user’s knowledge of the data, these ideally should be filled by the user.

Appendix A – Statistical Terms

p-value: Measure that quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favouring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.
IV (Information Value): A numerical value that quantifies the predictive power of an independent continuous variable x in capturing the binary dependent variable y. IV is helpful for reducing the number of variables as an initial step in preparing for Logistic Regression, especially when there are a large number of potential variables. IV is based on an analysis of each individual independent variable in turn without considering other predictor variables.
WOE (Weight of evidence): Closely related to the IV value, WOE measures the strength of each grouped attribute in predicting the desired value of the Dependent Variable.
VIF (Variance Inflation Factor): A measure of multicollinearity among the independent variables in a multiple regression model.
OOT (Out of Time) Sample: Used to indicate a dataset from a period outside the original model build window; used to validate the accuracy of the model in other time periods.
Gini coefficient: Gini coefficient, commonly known as Gini, is a metric widely used to evaluate classification models. It ranges from 0 to 1, with zero representing perfect equality (no discrimination) and one representing perfect inequality (perfect discrimination). In the context of credit risk modelling, a higher Gini coefficient indicates better model performance in terms of its ability to accurately rank borrowers based on their creditworthiness.
K-S (“Kolmogorov-Smirnov”) Value: The KS value provides a measure of the discriminatory power of a model. It looks at the maximum difference between the distribution of cumulative events and cumulative non-events and is a way of comparing the cumulative sum of the positive and negative classes. It measures the maximum difference between the two over the range of predicted probabilities. A high KS score indicates that the model has a better separation between the positive (goods) and negative classes (bads).
Skewness: Measure the degree of asymmetry of a distribution.
Kurtosis: Measure of the peak height of a distribution.
Base score: The actual score point in the scaled scorecard which gives the base odds for the target variable to go into the desired state.
Base odds: The odds of the target variable to go into the desired state at the base score.
PDO (Probability of Double Odds): The actual score points difference needed to increase the odds of the target variable’s target rate twice.

Frequently Asked Questions

Q. What are the minimum specifications for a machine to install and run ID.ai?

A. The minimum configurations a system hosting ID.ai is >=16 GB RAM minimum, 200 GB Free disk space with MS Word Installed

Q. Can I save the project in a custom folder other than the default folder path provided in ID.ai?

A. No, currently this facility is not available; it will be enabled in a future version of ID.ai.

Q. Does ID.ai run on Macbooks also?

A. The current version of ID.ai runs only in the Windows environment. Future versions will be Apple OS compatible also.

Q. What statistical technique is used for building the model?

A. Logistic regression

Q. What should user do if activation is unsuccessful?

A. Please reach out to your company’s system administrator who purchased the license keys from Corestrat or drop a mail to solutions@corestrat.ai with your license key

Q. Where can I find the current project saved?

A. The default location is saved in the default local system path usually having the following location “C:\users\<machinename>\documents\Idai\<projectname>”

Q. Where can I find current autodocument to be saved.

A. The default location is saved in the default local system path usually having the following location “C:\users\<machinename>\documents\Idai\<projectname>\documents

Q. How can I start a new project while on one project?

A. In the home screen: Click on Home > All Projects > New Project

Q. How can I delete the project or dataset I uploaded?

A. There is an option to “delete” (trash can icon) under each project in the path above

Q. How is the performance of the model evaluated?

A. Evaluate Your Model

Q. Can I make predictions with new data after training the model?

A. Cut-off Decision Overall

Q. Can I use the model to score new customers or cases?

A. Cut-off Decisions – Segmented

Q. Can I save and export my trained model?

A. Yes. The trained model is saved in the default folder path provided in ID.ai as a file in ‘.pkl’ format.

Q. Can I generate scorecards from the trained model?

A. Yes. The scorecard generated from the trained model is available in the ‘Model Scorecard’ tab of the ‘Evaluate Your Model’ section.

Q. Does ID.ai automatically detect the target variable?

A. ID.ai suggests a list of potential target variables; user has the option to accept from those or use another target variable. Target Variable Selection

Q. Can the user select the features to be included in the model?

A. Yes. Independent Variables Exclusion

Q. Where can I get the template for metadata?

A. Under “Data Preprocessing” tab, on the top right use the button named “Add Meta Data”

Version History & Feedback

Version History:
Version No.	Date	Changes
1.0	17 Sep 2024	First version

Feedback:

Please reach out to us at solutions@corestrat.ai for any questions and issues.

Our office address is

# LGF, Tower ‘B’, Diamond District,
Old HAL airport Road,
Domlur, Bangalore 560008

Table of Contents

Introduction

Software Requirements Specifications

Installation of ID.ai

Getting Started

Upload Data Section

Project Creation & File Input

Import from Databricks

Existing Projects

Input Data Management

Select Variables Section

Target Variable Selection

Stratified Sampling

Independent Variables Exclusion

Independent Variable Insights

Train a Model Section

Model Settings and Root Node

Auto Grow

Run Model.ai

Evaluate Your Model

Build Your Decision

Option A – Define Actual Target

Option B – Reject Inferencing

Cut-off Decision Overall

Cut-off Decisions – Segmented

Auto Documentation

Appendix A – Statistical Terms

Frequently Asked Questions

Version History & Feedback

Privacy Policy

Home

ID.ai

Rule.ai

Geninsight.ai

About Us

Success Story

Contact Us

Blog

Work With Us