Architecting Big Data Solutions - Use Cases and Scenarios

Kumaran Ponnambalam, Dedicated to Data Science Education

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
37 Lessons (5h 21m)
    • 1. Intro to ABDS

      4:26
    • 2. Traditional Data Solutions

      11:34
    • 3. Big Data Solutions

      7:57
    • 4. Current Big Data Trends

      8:33
    • 5. Intro to Big Data Solutions

      11:53
    • 6. Architecture template

      6:22
    • 7. Intro to Technology options

      5:32
    • 8. Challenges with Big Data Technologies

      8:55
    • 9. Acquire overview

      9:42
    • 10. Acquire options SQL and Files

      8:23
    • 11. Acquire Options REST and Streaming

      8:28
    • 12. Transport Overview

      9:55
    • 13. Transport Options SFTP and Sqoop

      11:44
    • 14. Transport Options Flume and Kafka

      10:01
    • 15. Persistence Overview

      9:58
    • 16. Persist Options RDBMS and HDFS

      11:36
    • 17. Persist Options Cassandra and MongoDB

      11:48
    • 18. Persist Options Neo4j and ElasticSearch

      8:53
    • 19. Transformation module

      10:39
    • 20. Transform Options MapReduce and SQL

      11:12
    • 21. Transform Options Spark and ETL Products

      11:42
    • 22. Reporting module

      8:58
    • 23. Reporting Options Impala and Spark SQL

      7:17
    • 24. Reporting Options Third Party and Elastic

      5:53
    • 25. Advanced Analytics Overview

      10:01
    • 26. Advanced Analytics Options R and Python

      7:27
    • 27. Advanced Analytics Apache Spark and Commerical Software

      6:33
    • 28. Use Case 1 Enterprise Data Backup

      6:17
    • 29. Use Case 2 Media File Store

      7:36
    • 30. Use Case 3 Social Media Sentiment Analysis

      9:50
    • 31. Use Case 4 Credit Card Fraud Detection

      10:00
    • 32. Use Case 5 Operations Analytics

      11:28
    • 33. Use Case 6 News Articles Recommendations

      7:54
    • 34. Use Case 7 Customer 360

      9:47
    • 35. Use Case 8 IOT Connected Car

      8:05
    • 36. Transitioning to Big Data

      3:23
    • 37. Closing Remarks ADBS

      1:38

Project Description

Practice:  Architect a Spam filtering solution

This is a practice for you to analyze a problem and come up with an architecture – very similar to the use cases you studied in the course. You don’t have to “match” the proposed answer/solution. Be creative and thorough in doing this analysis and coming up with the solution.

XYZ Corporation provides software products for its customers worldwide. Their customers send emails every day for various support requests. They average about 150K emails per day. Given that it’s an open email ID ( [email protected]), a number of spammers also send spam emails every day to the same ID. Spam accounts for 10% of the emails received. To avoid the extra labor of manual filtering of spam, XYZ Corporation requests you to architect a Spam filtering solution.

Here are some additional tips for you.

1.     There are multiple email servers located across 7 locations in the world that receive emails.

2.     The email server supports a REST API to stream new emails

3.     A Spam identification model needs to be built using machine learning. A Data Scientist will help you build the model based on past data

4.     The model then needs to be used to sort the email as Spam/Ham in real time and the classification need to be stored in a database along with the message and meta data.

A proposed solution is attached as a resource to the Project.

Student Projects