Commit d4f05173 authored by noureen.taj's avatar noureen.taj 🖖🏻

Update assignment3.md

parent c691e9da
# Assignment 3 # Assignment 3
Hey there! Welcome to Knowledge Lens Intern Training Program.
This Assignment will serve as a quick refresher on the usage of NoSQL and Time-series databases.
There are three tasks in this assignment, on completion of which you'll learn:
* How to interact with Mongo DB
* Using Pandas Dataframe and generating your own excel reports
* Leveraging Kairos Time-series database for data ingestion and querying the same
* Publishing and Consuming messages via MQTT protocol
* Caching mechanism using Redis DB
Happy Coding! :tada:
## Task 1: Working with Mongo - Advanced ## :pushpin: Task 1: Working with Mongo - Advanced
## Areas covered: ### :golf: Areas covered:
- Working with NoSQL - Working with NoSQL
- Working with Pandas - Working with Pandas
### :books: Description:
## Description:
You are given with a dataset of a restaurant review in the form of a JSON file. The end goal of the project is to create an API interface that will provide the following: You are given with a dataset of a restaurant review in the form of a JSON file. The end goal of the project is to create an API interface that will provide the following:
1. Business name with maximum number of highest average review. 1. Business name with maximum number of highest average review.
...@@ -18,75 +31,86 @@ You are given with a dataset of a restaurant review in the form of a JSON file. ...@@ -18,75 +31,86 @@ You are given with a dataset of a restaurant review in the form of a JSON file.
Sample Document: Sample Document:
```json ```json
{ {
"address": { "address": {
"building": "120", "building": "120",
"coord": [ "coord": [
-73.9998042, -73.9998042,
40.7251256 40.7251256
], ],
"street": "Prince Street", "street": "Prince Street",
"zipcode": "10012" "zipcode": "10012"
}, },
"borough": "Manhattan", "borough": "Manhattan",
"cuisine": "Bakery", "cuisine": "Bakery",
"grades": [ "grades": [
{ {
"date": { "date": {
"$date": "2014-10-17T00:00:00.000Z" "$date": "2014-10-17T00:00:00.000Z"
}, },
"grade": "A", "grade": "A",
"score": 11 "score": 11
}, },
{ {
"date": { "date": {
"$date": "2013-09-18T00:00:00.000Z" "$date": "2013-09-18T00:00:00.000Z"
}, },
"grade": "A", "grade": "A",
"score": 13 "score": 13
}, },
{ {
"date": { "date": {
"$date": "2013-04-30T00:00:00.000Z" "$date": "2013-04-30T00:00:00.000Z"
}, },
"grade": "A", "grade": "A",
"score": 7 "score": 7
}, },
{ {
"date": { "date": {
"$date": "2012-04-20T00:00:00.000Z" "$date": "2012-04-20T00:00:00.000Z"
}, },
"grade": "A", "grade": "A",
"score": 7 "score": 7
}, },
{ {
"date": { "date": {
"$date": "2011-12-19T00:00:00.000Z" "$date": "2011-12-19T00:00:00.000Z"
}, },
"grade": "A", "grade": "A",
"score": 3 "score": 3
} }
], ],
"name": "Olive'S", "name": "Olive'S",
"restaurant_id": "40363151" "restaurant_id": "40363151"
} }
``` ```
Bonus Points: Use Mongo Aggregate framework Bonus Points: Use Mongo Aggregate framework
### Tools to use:
### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. Robo3T / Studio3T / MongoDB Compass 2. Robo3T / Studio3T / MongoDB Compass
3. PyMongo 3. PyMongo
### Reference:
https://www.mongodb.com/docs/manual/tutorial/query-documents/ ### :mag: References:
https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/ * [Querying Documents on Mongo](https://www.mongodb.com/docs/manual/tutorial/query-documents/)
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html * [Quick Summary on Mongo Aggregation Stages](https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/)
https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse * [Generating Excel Sheets from a Pandas Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html)
https://pymongo.readthedocs.io/en/stable/ * [How to return files on FastAPI response](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse)
* [PyMongo Official Documentation](https://pymongo.readthedocs.io/en/stable/)
_________________________________
## Task 2: Working with Timeseries ## :pushpin: Task 2: Working with Time-series
### :golf: Areas covered:
- Timeseries Operation
- Working with Timeseries
- Working with Pandas
### :books: Description:
You are given with a dataset of weather in the form of a CSV file. The end goal of the project is to create an API interface that will provide the following: You are given with a dataset of weather in the form of a CSV file. The end goal of the project is to create an API interface that will provide the following:
...@@ -99,24 +123,32 @@ Sample Document: ...@@ -99,24 +123,32 @@ Sample Document:
|31/12/2004 01:00|13478 | |31/12/2004 01:00|13478 |
|31/12/2004 02:00|12865 | |31/12/2004 02:00|12865 |
### Tools to use:
### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. Pandas 2. Pandas
3. Kairos
### :mag: References:
* [How to query Kairos DB using Metrics](https://kairosdb.github.io/docs/restapi/QueryMetrics.html)
### Reference:
https://kairosdb.github.io/docs/restapi/QueryMetrics.html
https://pypi.org/project/kairosdb-python/ ------------------------------------------------------
## :pushpin: Task 3: Working with MQTT & REDIS
## Task 3: Working with MQTT & REDIS ### :golf: Areas covered:
- MQTT Protocol
- Caching using Redis DB
### :books: Description
# Description # Description
Data from different sites will be pushed with frequency of 10 seconds for the parameters PM10,PM2.5,SO2,NO2 via mqtt. Data from different sites will be pushed with frequency of 10 seconds for the parameters PM10,PM2.5,SO2,NO2 via mqtt.
data can be of different quality - Good ( 0 ), Maintainance ( 1 ), Error ( 2 ) data can be of different quality - Good ( 0 ), Maintainance ( 1 ), Error ( 2 )
Based on the quality of data update to different redis database. Based on the quality of data update to different Redis database.
sample data format: sample data format:
```json ```json
...@@ -129,15 +161,16 @@ sample data format: ...@@ -129,15 +161,16 @@ sample data format:
Use Redis for caching/storing information Use Redis for caching/storing information
create consumer's which consumes data from these topics and store to a redis db based on data quality. Create consumer's which consumes data from these topics and store to a Redis db based on data quality.
### Tools to use: ### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. MQTT - (paho-mqtt) 2. MQTT - (PIP package: `paho-mqtt`)
3. REDIS - (redis) 3. REDIS - (PIP package: `redis`)
### :mag: References:
* [Using MQTT in Python](https://www.emqx.com/en/blog/how-to-use-mqtt-in-python)
* [Connection to Redis in Python](https://docs.redis.com/latest/rs/references/client_references/client_python/)
### Reference:
https://www.emqx.com/en/blog/how-to-use-mqtt-in-python
https://docs.redis.com/latest/rs/references/client_references/client_python/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment