Data Providers#

class gordo_core.data_providers.base.GordoBaseDataProvider[source]#

Bases: object

abstract can_handle_tag(tag: str | SensorTag)[source]#

Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

classmethod from_dict(config: dict[str, Any], *, back_compatibles: dict[tuple[Optional[str], str], tuple[Optional[str], str]] | None = None) GordoBaseDataProvider[source]#
get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) Series | None[source]#

Latest data point of tag from some time in the past till before_time, None if nothing found. This function is optional for implementing in the child classes, if it’s not implemented NotImplementedError will be thrown.

get_metadata()[source]#

Get metadata about the current state of the data provider.

abstract load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#

Load the required data as an iterable of series where each contains the values of the tag with time index.

Parameters:
  • train_start_date – Datetime object representing the start of fetching data

  • train_end_date – Datetime object representing the end of fetching data

  • tag_list – List of tags to fetch, where each will end up being its own dataframe

  • dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.

  • kwargs – With these - additional data might be passed by data_provider.

tag_normalizer(sensors: list[Union[dict[str, Optional[str]], str, gordo_core.sensor_tag.SensorTag]], **kwargs: str | None) list[Union[str, gordo_core.sensor_tag.SensorTag]][source]#

Prepare and validate sensors list. This function might be useful for overwriting in the extended class.

tags_required_fields = ()#
to_dict()[source]#

Serialize this object into a dict representation, which can be used to initialize a new object after popping type from the dict.

class gordo_core.data_providers.providers.InfluxDataProvider(measurement: str, value_name: str = 'Value', api_key: str | None = None, api_key_header: str | None = None, client: DataFrameClient | None = None, uri: str | None = None, **kwargs)[source]#

Bases: GordoBaseDataProvider

Parameters:
  • measurement – Name of the measurement to select from in Influx

  • value_name – Name of value to select, default to ‘Value’

  • api_key – Api key to use in header

  • api_key_header – Key of header to insert the api key for requests

  • uri – Create a client from a URI format: <username>:<password>@<host>:<port>/<optional-path>/<db_name>

  • kwargs – These are passed directly to the init args of influxdb.DataFrameClient

can_handle_tag(tag: str | SensorTag)[source]#

Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

get_list_of_tags() list[str]#

Queries Influx for the list of tags, using a TTL cache of 600 seconds. The cache can be cleared with provider.get_list_of_tags.cache_clear() as is usual with cachetools.

Return type:

The list of tags in Influx

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#

See GordoBaseDataProvider for documentation

read_single_sensor(train_start_date: datetime, train_end_date: datetime, tag: str, measurement: str) Series[source]#
Parameters:
  • train_start_date (datetime) – Datetime to start querying for data

  • train_end_date (datetime) – Datetime to stop query for data

  • tag (str) – Name of the tag to match in influx

  • measurement (str) – name of the measurement to select from

exception gordo_core.data_providers.providers.NoSuitableDataProviderError[source]#

Bases: ValueError

class gordo_core.data_providers.providers.RandomDataProvider(min_size=100, max_size=300)[source]#

Bases: GordoBaseDataProvider

Get GordoBaseDataset which returns unstructed values for X and y. Each instance uses the same seed, so should be a function (same input -> same output)

can_handle_tag(tag: str | SensorTag)[source]#

Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) Series | None[source]#

Uses the same logic as method in parent class.

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#

Load the required data as an iterable of series where each contains the values of the tag with time index.

Parameters:
  • train_start_date – Datetime object representing the start of fetching data

  • train_end_date – Datetime object representing the end of fetching data

  • tag_list – List of tags to fetch, where each will end up being its own dataframe

  • dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.

  • kwargs – With these - additional data might be passed by data_provider.

Data provider examples:

class gordo_core.data_providers.contrib.csv_provider.CSVDataProvider(file_path: str | Path, timestamp_column: str, sep: str = ',')[source]#

Bases: GordoBaseDataProvider

Parameters:
  • file_path – Path to a CSV file containing the data to be loaded.

  • timestamp_column – Column in the CSV file containing the timestamps for each row.

  • sep – Delimiter to use.

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#

Load data from the CSV file.