NSE has ~9,000 instruments. Here’s how I collect real-time tick data for all of them, store it cheaply, and query it after market close using DuckDB
Tech Stack
- KiteConnect broker for real-time data
- DuckDB for querying
- FastAPI
- Parquet file format
- AWS
The Architecture
- Starting the server: Before market opens, the AWS Event bridge will start the server and setup Websocket connections with the broker(Zerodha supports at max 3K instruments/connection, So we need to use multiple connections to collect more than 3,000 instruments).
- Accept Ticks: Once market opens, we start receiving ticks through each websocket connection, which we push into the local queue.
- Writing data to storage: A Consumer watches the queue, and once the queue reaches the batch size it pop the ticks batch and append into the parquet file.
- Making data available for query: Once market closes, we close the local parquet file and upload it to S3.
- Shutdown server: After the upload is completed, we shut down the instance to save cost
- How data is formatted: Inside S3, all the parquet files are inside a Hive-style partitioning path i.e
ticks/date=DD-MM-YYYY/ticks.parquet, this type of file structure allows us to query parquet dataset efficiently.
-- the duckdb engine only scans ticks/date=29-05-2026/ instead of all files.
WHERE date = '29-05-2026'