Data Providers#
- class gordo_core.data_providers.base.GordoBaseDataProvider[source]#
Bases:
object- abstract can_handle_tag(tag: str | SensorTag)[source]#
Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.
- classmethod from_dict(config: dict[str, Any], *, back_compatibles: dict[tuple[Optional[str], str], tuple[Optional[str], str]] | None = None) GordoBaseDataProvider[source]#
- get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) Series | None[source]#
Latest data point of tag from some time in the past till before_time, None if nothing found. This function is optional for implementing in the child classes, if it’s not implemented
NotImplementedErrorwill be thrown.
- abstract load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#
Load the required data as an iterable of series where each contains the values of the tag with time index.
- Parameters:
train_start_date – Datetime object representing the start of fetching data
train_end_date – Datetime object representing the end of fetching data
tag_list – List of tags to fetch, where each will end up being its own dataframe
dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.
kwargs – With these - additional data might be passed by
data_provider.
- tag_normalizer(sensors: list[Union[dict[str, Optional[str]], str, gordo_core.sensor_tag.SensorTag]], **kwargs: str | None) list[Union[str, gordo_core.sensor_tag.SensorTag]][source]#
Prepare and validate sensors list. This function might be useful for overwriting in the extended class.
- tags_required_fields = ()#
- class gordo_core.data_providers.providers.InfluxDataProvider(measurement: str, value_name: str = 'Value', api_key: str | None = None, api_key_header: str | None = None, client: DataFrameClient | None = None, uri: str | None = None, **kwargs)[source]#
Bases:
GordoBaseDataProvider- Parameters:
measurement – Name of the measurement to select from in Influx
value_name – Name of value to select, default to ‘Value’
api_key – Api key to use in header
api_key_header – Key of header to insert the api key for requests
uri – Create a client from a URI format:
<username>:<password>@<host>:<port>/<optional-path>/<db_name>kwargs – These are passed directly to the init args of influxdb.DataFrameClient
- can_handle_tag(tag: str | SensorTag)[source]#
Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.
- get_list_of_tags() list[str]#
Queries Influx for the list of tags, using a TTL cache of 600 seconds. The cache can be cleared with
provider.get_list_of_tags.cache_clear()as is usual with cachetools.- Return type:
The list of tags in Influx
- load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#
See
GordoBaseDataProviderfor documentation
- exception gordo_core.data_providers.providers.NoSuitableDataProviderError[source]#
Bases:
ValueError
- class gordo_core.data_providers.providers.RandomDataProvider(min_size=100, max_size=300)[source]#
Bases:
GordoBaseDataProviderGet
GordoBaseDatasetwhich returns unstructed values for X and y. Each instance uses the same seed, so should be a function (same input -> same output)- can_handle_tag(tag: str | SensorTag)[source]#
Returns true if the dataprovider thinks it can possibly read this tag. Typically checks if the asset part of the tag is known to the reader.
- get_closest_datapoint(tag: str | SensorTag, before_time: datetime, point_max_look_back: Timedelta) Series | None[source]#
Uses the same logic as method in parent class.
- load_series(train_start_date: datetime, train_end_date: datetime, tag_list: list[Union[str, gordo_core.sensor_tag.SensorTag]], dry_run: bool | None = False, **kwargs) Iterable[Tuple[Series, str | SensorTag]][source]#
Load the required data as an iterable of series where each contains the values of the tag with time index.
- Parameters:
train_start_date – Datetime object representing the start of fetching data
train_end_date – Datetime object representing the end of fetching data
tag_list – List of tags to fetch, where each will end up being its own dataframe
dry_run – Set to true to perform a “dry run” of the loading. Up to the implementations to determine what that means.
kwargs – With these - additional data might be passed by
data_provider.
Data provider examples:
- class gordo_core.data_providers.contrib.csv_provider.CSVDataProvider(file_path: str | Path, timestamp_column: str, sep: str = ',')[source]#
Bases:
GordoBaseDataProvider- Parameters:
file_path – Path to a CSV file containing the data to be loaded.
timestamp_column – Column in the CSV file containing the timestamps for each row.
sep – Delimiter to use.