Sales Forecasting and EDA Challenge
This challenge involves the task of Weekly Sales Forecasting and Exploratory Data Analytics (EDA).
Time series analysis deals with time series based data to extract patterns for predictions and other characteristics of the data. It uses a model for forecasting future values in a small time frame based on previous observations. It is widely used for non-stationary data, such as economic data, weather data, stock prices, and retail sales forecasting.
The challenge is divided into two parts:
First Task is to provide Exploratory Data Analysis (EDA) for the given data in (.ipynb and its corresponding .pdf)
- A Sample Jupyter Notebook ( Chicago Crime Dataset Sample EDA.ipynb and its '.pdf' form ) has been given to provide a basic understanding of EDA, Chicago Crime Dataset is used for the example. It has nothing to do with the actual training data, and only serves as an example to help the participants develop a good understanding of the task. The actual training data is given in a separate csv file "train.csv".
Second task is to design a model to predict sales for the next week based on previous data observations.
- The next 7 dates consisting of the week will be chosen based on the latest 'Order Date'.
- Example: If latest "Order Date" is 2018-06-20, the prediction date starts from 2018-06-21.
- sample-output.csv has been given as an example of final output file.
Superstore Sales Dataset is used for the challenge.
- It consists of 18 attributes with "Sales" being the target attribute for prediction.
NOTE: Some values of attribute "Ship Date" can be NaN as those products had not been shipped by the provided dates
The training data is given in the file "train.csv"
LICENSE : Not Specified
This dataset will not be used for any commercial product development. It is provided for Research Analysis only.
Please READ the Submission Guidelines carefully as submissions having incorrect format will be rejected.
The final submission should include:
- Exploratory Data Analysis (EDA) notebook (.ipynb and corresponding .pdf) providing detailed analysis of the "train.csv" (Note: EDA is only required for training files and not the test files)
- Model Files (Python based)
- requirements.txt (providing details of modules required to run your submission)
- Forecasting Model Training Files (.ipynb)
- Code Execution script (for prediction of weekly sales) (only .py.) (run.py)
python run.py test.csv
"run.py" should accept an unnamed argument which would be test file present in the same directory as "run.py" and should generate an output for next week's sales prediction based on the latest "Order Date" attribute in "test.csv". A sample output is already provided along with the training data.
The output of the above script should be "output.csv" with values of "Prediction Date" and "Predicted Sales".
Other files may be included for purposes of code modularity.
The criteria of judgement will be:
- Accuracy using RMSE error
- Exploratory Data Analysis (EDA) Report
- Submission must not include copyrighted code. If violation is found, submission will be rejected.
- The submission should be in a proper format as described by "Submission Guidelines".
- Late submission will not be accepted beyond provided deadline (Indian Standard Time).
- The candidates will be invited for an Internship Interview based on their performance.
- Certificates will be provided after successful submission of a solution to this challenge from dockship.io