Data Collections

Working with InsightCloudSec Data Collection for Reusable Data Definitions

Data Collections simplify resource filtering, Insight analysis, and Bot configuration. This feature allows administrators to build out reusable data definitions--collections of strings--that can be used and reused when creating and updating Insights and Bots. The data collections can be associated with any number of the hundreds of filters in the product.

When you edit a Data Collection, all Insights and Bots that use that collection will automatically use the updated collection next time they run; you needn't repeat your edits across multiple Insights and Bots.

Use Case Example

Say you want to specify a list of trusted accounts, and disallow certain kinds of activity from all other accounts. You might set up Bots configured with these Insights:

  • Resource With Cross Account Access to Unknown Account
  • Network Peers Connected to Unknown Accounts
  • Service Role Trusting Unknown Account
  • Cloud Role Trusting Unknown Account

You'll configure all of these Bots with the same set of account numbers. Manually entering and updating the list of allowed accounts in all of these Bots is a tedious and error-prone process; Data Collections can make correctly managing these reused inputs fast and easy, even for sets of tens of thousands of strings.

A Data Collection can contain up to 4MB of strings, allowing you to manage tens of thousands of entries, defining the behavior of many Insights and Bots, in a single list. The admin creating or editing a Data Collection is responsible for ensuring the integrity of the collection, e.g., a list of accounts must contain only valid account numbers; Data Collections will not validate the entered lists.

The Data Collections Page

You can access the Data Collections page from the Cloud section of the navigation menu. On this page, you can:

  1. Add new entries -- select New Collections in the upper right hand corner.
  2. Delete outdated collections -- check the box to the left of the collection to be deleted; then select the trash can that appears above the collections list.
  3. View and edit your collections -- select a collection by clicking on the blue text; edit or add descriptions for your entries.
Data Collections Landing PageData Collections Landing Page

Data Collections Landing Page

The view of collection entries consists of:

  1. A space to enter the input string (e.g., an account number).
  2. A description -- this can help you understand why items have been added to your Data Collection and assist in auditing and maintaining the collections.
Data Collections Example - Trusted Third Party AccountsData Collections Example - Trusted Third Party Accounts

Data Collections Example - Trusted Third Party Accounts

Using Data Collections

📘

The process described below uses data collections in filtering Resources. This same process, though, describes the use of data collections when working with Insights or Bots. For entering and editing large lists, refer also to the API document for Data Collections.

Creating a New Data Collection

1. Access the Resources page. Select 'Filters' in the upper right-hand corner. This opens the Filters pane.
2. Use the search bar to find the name of the filter of interest, e.g., Resource Trusting Unknown Account.

Searching for FiltersSearching for Filters

Searching for Filters

3. Enter tags, names, or other strings to configure the Filter.

4. Select Create to create a new Data Collection containing these inputs.

Create New Data CollectionCreate New Data Collection

Create New Data Collection

5. Use the Create modal to name the new collection.

Naming the New Data CollectionNaming the New Data Collection

Naming the New Data Collection

Importing a CSV

Alternatively, data collections can be created by importing a CSV file:

1. From the Data Collections page, select "New Collections" in the upper right-hand corner.

2. Select "Choose a file" to import your CSV.

Importing a Data Collection CSVImporting a Data Collection CSV

Importing a Data Collection CSV

3. The CSV should be formatted using two columns for value-description pairs (not key-value pairs).

  • Each value has a max character limit of 255.
  • Each value must be unique within a data collection; duplicate values are ignored.
  • Empty values are automatically excluded.
  • The description is not required and does not have a character limit.
Example - Approved AMIs in the value column and image name in the description column.Example - Approved AMIs in the value column and image name in the description column.

Example - Approved AMIs in the value column and image name in the description column.

4. After selecting the CSV and naming the new data collection, choose Create to complete importing.

New Data CollectionNew Data Collection

New Data Collection

Using an Existing Data Collection

In the Resource Filters pane, you can select an existing Data Collection:

Using Data Collections Programmatically

Many use cases for Data Collections -- such as maintaining lists of hundreds or thousands of allowed accounts -- are best done programatically. To support these use cases, we've written the module provided below, which uses our REST API to create, populate, and update Data Collection contents. These functions can be imported directly and used as a module in your management scripts, or can be copied-and-pasted from as snippets.

"""
This collection of functions can help you programatically manipulate Data
Collections in InsightCloudSec using our REST API.

These functions are intended to be copied and pasted into your code, or can be
imported and used directly in your tools.
"""

from collections import Mapping
import json
import os
import requests
from requests.api import request


# Assumes the presense of these environment variables:
# INSIGHTCLOUDSEC_API_USER_USERNAME -- user we want to authenticate as
# INSIGHTCLOUDSEC_API_USER_PASSWORD -- user's password
# INSIGHTCLOUDSEC_BASE_URL -- base url for the InsightCloudSec instance to run against
# INSIGHTCLOUDSEC_API_KEY -- API Key to interact with the InsightCloudSec REST API
USERNAME, PASSWORD, BASE_URL, API_KEY = (os.environ['INSIGHTCLOUDSEC_API_USER_USERNAME'],
                                os.environ['INSIGHTCLOUDSEC_API_USER_PASSWORD'],
                                os.environ['INSIGHTCLOUDSEC_BASE_URL'],
                                os.environ['INSIGHTCLOUDSEC_API_KEY'])
BASE_HEADERS = {'Content-Type': 'application/json;charset=UTF-8',
                'Accept': 'application/json',
                'Api-Key': API_KEY}


def create_data_collection(collection_name, collection_data=None):
    """
    Create a new data collection with name `collection_name`, optionally
    populated with the values in the dictionary `collection_data`.

    `collection_data` should be one of 2 things:
    - a dictionary mapping Data Collection values to descriptions. Using `None`
      as a description will result in the description being set to the empty
      string.
    - an iterable of strings, which will be inserted as values with no
      descriptions.

    For example, the following is a valid input:

    ```
    {
        'value one': 'description for value one',
        'value two': None,  # description will be set to the empty string
        'value three': 'description for value three'
    }
    ```

    Or you can simply pass a list like `['first value', 'second value']`, which
    is equivalent to passing `{'first value': None, 'second value': None}`
    """

    data = {'collection_name': collection_name,
            'collection_data': normalize_collection(collection_data)}

    return requests.post(
        url=requests.compat.urljoin(BASE_URL, '/v2/datacollections/'),
        headers=BASE_HEADERS,
        data=json.dumps(data)
    )

def update_data_collection(collection_id, collection_data):
    """
    Update the existing data collection with integer ID `collection_id` using
    the data in `collection_data`.

    `collection_data` should be a dictionary mapping values to descriptions.
    Any new key: description value pairs will be inserted into the data 
    collection; any whose key already exists in the Data Collection will be used 
    to update the existing description value.

    Descriptions may be `None`. A description equaling `None` in a new entry
    will result in an empty description. For existing entries, a description
    equaling `None` will result in no changes to an existing description;
    setting a description to the empty string must be done explicitly. For
    example:

    ```
    {
        # set the description for an existing value 'value one', or create a
        # new value with that description
        'value one': 'description for value one',
        # leave the description for an existing value 'value two' unchanged, or
        # create a value 'value two' with no description
        'value two': None,
        # empty the description for an existing value 'value three', or create
        # a new value with an empty description
        'value three': ''
    }
    ```

    Note that this operation does not remove any entries from the data
    collection.
    """
    url = requests.compat.urljoin(
        BASE_URL,
        requests.compat.urljoin('/v2/datacollections/', str(collection_id))
    )

    result = requests.post(
        url=url,
        headers=BASE_HEADERS,
        data=json.dumps({'collection_data': normalize_collection(collection_data)})
    )
    return result


def delete_data_collection_values(collection_id, values_to_delete):
    """
    Delete all entries with values in the iterable `values_to_delete`.

    Note that this is a 2-phase operation: this first checks that the values
    exist and gets their IDs within the collection, then sends the request to
    delete them. This means that calling this method concurrently with other
    data collection manipulation could have unexpected results.
    """
    url = requests.compat.urljoin(
        BASE_URL,
        requests.compat.urljoin('/v2/datacollections/', str(collection_id))
    )

    # phase 1: grab existing entries
    collection_result = requests.get(
        url=url, headers=BASE_HEADERS
    )
    existing_values_to_ids = {
        datum['value']: int(datum['id'])
        for datum in collection_result.json()['collection']['data']
    }

    # pre-deletion check: we should only try to delete entries that exist
    if not set(values_to_delete) < set(existing_values_to_ids):
        raise ValueError(
            'Some values to be deleted not in existing data '
            'collection: {}'.format(set(values_to_delete) - set(existing_values_to_ids))
        )
    # phase 2: delete specified entries
    return requests.delete(
        url=url,
        headers=BASE_HEADERS,
        data=json.dumps({
            'data_ids': [existing_values_to_ids[value] for value in values_to_delete]
        })
    )

def normalize_collection(collection_data):
    if isinstance(collection_data, Mapping):
        return collection_data
    return {datum: None for datum in collection_data}

Did this page help you?