Data Providers#

class gordo_core.data_providers.base.GordoBaseDataProvider[source]#

Bases: object

abstract can_handle_tag(tag: str | SensorTag)[source]#: Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

classmethod from_dict(config: dict[str, Any], *, back_compatibles: dict[tuple[Optional[str], str], tuple[Optional[str], str]] | None = None) → GordoBaseDataProvider[source]#

get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) → Series | None[source]#: Latest data point of tag from some time in the past till before_time, None if nothing found. This function is optional for implementing in the child classes, if it’s not implemented NotImplementedError will be thrown.

get_metadata()[source]#: Get metadata about the current state of the data provider.

abstract load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) → Iterable[Tuple[Series, str | SensorTag]][source]#

Load the required data as an iterable of series where each contains the values of the tag with time index.

Parameters:

train_start_date – Datetime object representing the start of fetching data
train_end_date – Datetime object representing the end of fetching data
tag_list – List of tags to fetch, where each will end up being its own dataframe
dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.
kwargs – With these - additional data might be passed by data_provider.

tag_normalizer(sensors: list[Union[dict[str, Optional[str]], str, gordo_core.sensor_tag.SensorTag]], **kwargs: str | None) → list[Union[str, gordo_core.sensor_tag.SensorTag]][source]#: Prepare and validate sensors list. This function might be useful for overwriting in the extended class.

tags_required_fields = ()#

to_dict()[source]#: Serialize this object into a dict representation, which can be used to initialize a new object after popping type from the dict.

class gordo_core.data_providers.providers.InfluxDataProvider(measurement: str, value_name: str = 'Value', api_key: str | None = None, api_key_header: str | None = None, client: DataFrameClient | None = None, uri: str | None = None, **kwargs)[source]#

Bases: GordoBaseDataProvider

Parameters:

measurement – Name of the measurement to select from in Influx
value_name – Name of value to select, default to ‘Value’
api_key – Api key to use in header
api_key_header – Key of header to insert the api key for requests
uri – Create a client from a URI format: <username>:<password>@<host>:<port>/<optional-path>/<db_name>
kwargs – These are passed directly to the init args of influxdb.DataFrameClient

can_handle_tag(tag: str | SensorTag)[source]#: Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

get_list_of_tags() → list[str]#

Queries Influx for the list of tags, using a TTL cache of 600 seconds. The cache can be cleared with provider.get_list_of_tags.cache_clear() as is usual with cachetools.

Return type:: The list of tags in Influx

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) → Iterable[Tuple[Series, str | SensorTag]][source]#: See GordoBaseDataProvider for documentation

read_single_sensor(train_start_date: datetime, train_end_date: datetime, tag: str, measurement: str) → Series[source]#

Parameters:

train_start_date (datetime) – Datetime to start querying for data
train_end_date (datetime) – Datetime to stop query for data
tag (str) – Name of the tag to match in influx
measurement (str) – name of the measurement to select from

exception gordo_core.data_providers.providers.NoSuitableDataProviderError[source]#: Bases: ValueError

class gordo_core.data_providers.providers.RandomDataProvider(min_size=100, max_size=300)[source]#

Bases: GordoBaseDataProvider

Get GordoBaseDataset which returns unstructed values for X and y. Each instance uses the same seed, so should be a function (same input -> same output)

can_handle_tag(tag: str | SensorTag)[source]#: Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.

get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) → Series | None[source]#: Uses the same logic as method in parent class.

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) → Iterable[Tuple[Series, str | SensorTag]][source]#

Load the required data as an iterable of series where each contains the values of the tag with time index.

Parameters:

train_start_date – Datetime object representing the start of fetching data
train_end_date – Datetime object representing the end of fetching data
tag_list – List of tags to fetch, where each will end up being its own dataframe
dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.
kwargs – With these - additional data might be passed by data_provider.

Data provider examples:

class gordo_core.data_providers.contrib.csv_provider.CSVDataProvider(file_path: str | Path, timestamp_column: str, sep: str = ',')[source]#

Bases: GordoBaseDataProvider

Parameters:

file_path – Path to a CSV file containing the data to be loaded.
timestamp_column – Column in the CSV file containing the timestamps for each row.
sep – Delimiter to use.

load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) → Iterable[Tuple[Series, str | SensorTag]][source]#: Load data from the CSV file.