NSE has ~9,000 instruments. Here’s how I collect real-time tick data for all of them, store it cheaply, and query it after market close using DuckDB

Tech Stack

  1. KiteConnect broker for real-time data
  2. DuckDB for querying
  3. FastAPI
  4. Parquet file format
  5. AWS

The Architecture

  • Starting the server: Before market opens, the AWS Event bridge will start the server and setup Websocket connections with the broker(Zerodha supports at max 3K instruments/connection, So we need to use multiple connections to collect more than 3,000 instruments).
  • Accept Ticks: Once market opens, we start receiving ticks through each websocket connection, which we push into the local queue.
  • Writing data to storage: A Consumer watches the queue, and once the queue reaches the batch size it pop the ticks batch and append into the parquet file.
  • Making data available for query: Once market closes, we close the local parquet file and upload it to S3.
  • Shutdown server: After the upload is completed, we shut down the instance to save cost
  • How data is formatted: Inside S3, all the parquet files are inside a Hive-style partitioning path i.e ticks/date=DD-MM-YYYY/ticks.parquet, this type of file structure allows us to query parquet dataset efficiently.
-- the duckdb engine only scans ticks/date=29-05-2026/ instead of all files.
WHERE date = '29-05-2026'