Commit 2baca000 authored by noureen.taj's avatar noureen.taj 🖖🏻

Update assignment2.md

parent 3b748150
# Assignment 2
# Assignment 1
Hey there! Welcome to Knowledge Lens Intern Training Program.
## Task 1: Working with Mongo - Advanced This Assignment will serve as a quick refresher on the usage of NoSQL and Time-series databases.
There are three tasks in this assignment, on completion of which you'll learn:
* How to interact with Mongo DB
* Using Pandas Dataframe and generating your own excel reports
* Leveraging Kairos Time-series database for data ingestion and querying the same
* Publishing and Consuming messages via MQTT protocol
* Caching mechanism using Redis DB
Happy Coding! :tada:
## Areas covered: ## :pushpin: Task 1: Working with Mongo - Advanced
### :golf: Areas covered:
- Timeseries Operation - Timeseries Operation
- Working with NoSQL - Working with NoSQL
- Working with Pandas - Working with Pandas
## Description: ### :books: Description:
You are given with a dataset of bicycle rental company in the form of a JSON file. The end goal of the project is to create an API interface that will provide the following:
You are given with a dataset of bicycle rental company in the form of a JSON file. The end goal of the project is to create an API interface that will provide the following:
1. Get the user who has the highest trip duration. 1. Get the user who has the highest trip duration.
2. Get the user who has used the service the most. 2. Get the user who has used the service the most.
3. Generate Excel Report based on `bike id`, `station name` and `start date` 3. Generate Excel Report based on `bike id`, `station name` and `start date`
Sample Document: Sample Document:
```json ```json
{ {
"tripduration": 889, "tripduration": 889,
"start station id": 268, "start station id": 268,
"start station name": "Howard St & Centre St", "start station name": "Howard St & Centre St",
"end station id": 3002, "end station id": 3002,
"end station name": "South End Ave & Liberty St", "end station name": "South End Ave & Liberty St",
"bikeid": 22794, "bikeid": 22794,
"usertype": "Subscriber", "usertype": "Subscriber",
"birth year": 1961, "birth year": 1961,
"start station location": { "start station location": {
"type": "Point", "type": "Point",
"coordinates": [ "coordinates": [
-73.99973337, -73.99973337,
40.71910537 40.71910537
] ]
}, },
"end station location": { "end station location": {
"type": "Point", "type": "Point",
"coordinates": [ "coordinates": [
-74.015756, -74.015756,
40.711512 40.711512
] ]
}, },
"start time": { "start time": {
"$date": "2016-01-01T00:01:06.000Z" "$date": "2016-01-01T00:01:06.000Z"
}, },
"stop time": { "stop time": {
"$date": "2016-01-01T00:15:56.000Z" "$date": "2016-01-01T00:15:56.000Z"
} }
} }
``` ```
Bonus Points: Use Mongo Aggregate framework Bonus Points: Use Mongo Aggregate framework
### Tools to use: ### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. Robo3T / Studio3T / MongoDB Compass 2. Robo3T / Studio3T / MongoDB Compass
3. PyMongo 3. PyMongo
### Reference: ### :mag: References:
https://www.mongodb.com/docs/manual/tutorial/query-documents/ * [Querying Documents on Mongo](https://www.mongodb.com/docs/manual/tutorial/query-documents/)
https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/ * [Quick Summary on Mongo Aggregation Stages](https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/)
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html * [Generating Excel Sheets from a Pandas Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html)
https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse * [How to return files on FastAPI response](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse)
https://pymongo.readthedocs.io/en/stable/ * [PyMongo Official Documentation](https://pymongo.readthedocs.io/en/stable/)
________________________________________________________ _________________________________
## Task 2: Working with Timeseries
## :pushpin: Task 2: Working with Timeseries
## Areas covered:
### :golf: Areas covered:
- Timeseries Operation - Timeseries Operation
- Working with Timeseries - Working with Timeseries
- Working with Pandas - Working with Pandas
## Description: ### :books: Description:
You are given with a dataset of weather in the form of a CSV file. The end goal of the project is to create an API interface that will provide the following: You are given with a dataset of weather in the form of a CSV file. The end goal of the project is to create an API interface that will provide the following:
1. Get daily, monthly, weekly and monthly aggregate (min, max, and average) of the data and generate report in Excel format. 1. Get daily, monthly, weekly and monthly aggregate (min, max, and average) of the data and generate report in Excel format.
...@@ -90,17 +101,25 @@ Sample Document: ...@@ -90,17 +101,25 @@ Sample Document:
|2006-04-01 00:00:00.000 +0200|Partly Cloudy|rain |9.472222222 |7.388888889 |0.89 |14.1197 |251 |15.8263 |0 |1015.13 | |2006-04-01 00:00:00.000 +0200|Partly Cloudy|rain |9.472222222 |7.388888889 |0.89 |14.1197 |251 |15.8263 |0 |1015.13 |
### Tools to use: ### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. Pandas 2. Pandas
3. Kairos
### :mag: References:
* [How to query Kairos DB using Metrics](https://kairosdb.github.io/docs/restapi/QueryMetrics.html)
### Reference: ------------------------------------------------------
https://kairosdb.github.io/docs/restapi/QueryMetrics.html
https://pypi.org/project/kairosdb-python/
## Task 3: Working with MQTT & REDIS ## :pushpin: Task 3: Working with MQTT & REDIS
### :golf: Areas covered:
- MQTT Protocol
- Caching using Redis DB
### :books: Description
# Description
In a theatre, Tickets are to be given to people in queue, There are 2 counters for Gold and Silver class. In a theatre, Tickets are to be given to people in queue, There are 2 counters for Gold and Silver class.
...@@ -111,7 +130,7 @@ Store the information of booking, Maximum 5 tickets allowed in a booking. ...@@ -111,7 +130,7 @@ Store the information of booking, Maximum 5 tickets allowed in a booking.
- Publish message to a MQTT topic - Publish message to a MQTT topic
- Subscribe message, Assign seats per requirement and store information to redis db. - Subscribe message, Assign seats per requirement and store information to Redis db.
sample format: sample format:
```json ```json
...@@ -122,17 +141,18 @@ sample format: ...@@ -122,17 +141,18 @@ sample format:
} }
``` ```
Use Redis for caching/storing information Use Redis for caching/storing information.
create consumer's which consumes data from these topics and store to a redis db Create consumer's which consumes data from these topics and store to a Redis db.
### Tools to use: ### :wrench: Tools to use:
1. Pycharm / VSCode 1. Pycharm / VSCode
2. MQTT - (paho-mqtt) 2. MQTT - (PIP package: `paho-mqtt`)
3. REDIS - (redis) 3. REDIS - (PIP package: `redis`)
### :mag: References:
* [Using MQTT in Python](https://www.emqx.com/en/blog/how-to-use-mqtt-in-python)
* [Connection to Redis in Python](https://docs.redis.com/latest/rs/references/client_references/client_python/)
### Reference:
https://www.emqx.com/en/blog/how-to-use-mqtt-in-python
https://docs.redis.com/latest/rs/references/client_references/client_python/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment