Кратко:

  • Датасеты часто выкладывают в виде CSV- или TSV-файлов.
  • Сохраняйте датасеты в объектное хранилище и анализируйте данные с помощью ClickHouse.
  • В качестве примера возьмем датасет с историей метеонаблюдений за 10 лет.
  • На панели слева выберите вкладку SQL и введите пароль пользователя.
  • В правом поле открывшейся консоли мы и станем вводить SQL-запросы.
  • Выполните запрос: SELECT City, LocalDate, TempC FROM s3('https://storage.yandexcloud.net/arhipov/weather_data.tsv', 'TSV', 'LocalDateTime DateTime, LocalDate Date, Month Int8, Day Int8, TempC Float32, Pressure Float32, RelHumidity Int32, WindSpeed10MinAvg Int32, VisibilityKm Float32, City String').
  • Измените поля в запросе, чтобы проверить разницу относительной влажности.
SELECT
    City,
    LocalDate,
    TempC
FROM s3(
        'https://storage.yandexcloud.net/arhipov/weather_data.tsv',
        'TSV',
        'LocalDateTime DateTime, LocalDate Date, Month Int8, Day Int8, TempC Float32,Pressure Float32, RelHumidity Int32, WindSpeed10MinAvg Int32, VisibilityKm Float32, City String')
ORDER BY TempC ASC
LIMIT 1

Проверьте себя

SELECT
    Year,
    msk.t - spb.t
FROM
(
    SELECT
        toYear(LocalDate) AS Year,
        avg(TempC) AS t
    FROM s3(
        'https://storage.yandexcloud.net/arhipov/weather_data.tsv',
        'TSV',
        'LocalDateTime DateTime, LocalDate Date, Month Int8, Day Int8, TempC Float32,Pressure Float32, RelHumidity Int32, WindSpeed10MinAvg Int32, VisibilityKm Float32, City String')
    WHERE City = 'Moscow'
    GROUP BY Year
    ORDER BY Year ASC
) AS msk
INNER JOIN
(
    SELECT
        toYear(LocalDate) AS Year,
        avg(TempC) AS t
    FROM s3(
        'https://storage.yandexcloud.net/arhipov/weather_data.tsv',
        'TSV',
        'LocalDateTime DateTime, LocalDate Date, Month Int8, Day Int8, TempC Float32,Pressure Float32, RelHumidity Int32, WindSpeed10MinAvg Int32, VisibilityKm Float32, City String')
    WHERE City = 'Saint-Petersburg'
    GROUP BY Year
    ORDER BY Year ASC
) AS spb ON msk.Year = spb.Year
SELECT
    City,
    toYear(LocalDate) AS year,
    MIN(LocalDate)
FROM
(
    SELECT
        City,
        LocalDate,
        windowFunnel(864000)(LocalDateTime, TempC >= 15, TempC >= 15, TempC >= 15, TempC >= 15, TempC >= 15) AS warmdays
    FROM s3(
        'https://storage.yandexcloud.net/arhipov/weather_data.tsv',
        'TSV',
        'LocalDateTime DateTime, LocalDate Date, Month Int8, Day Int8, TempC Float32,Pressure Float32, RelHumidity Int32, WindSpeed10MinAvg Int32, VisibilityKm Float32, City String')
    GROUP BY
        City,
        LocalDate
)
WHERE warmdays = 5
GROUP BY
    year,
    City
ORDER BY
    year ASC,
    City ASC