Aws Bigdata Engineer Resume
PROFESSIONAL SUMMARY:
Professional Bigdata Engineer with 2.5 years of industry experience in Bigdata technologies. Expertise in Hadoop/Spark and AWS Bigdata environment with Outstanding communication skills and proven to work in a fast - paced environment. Enthusiastically seeking a Data Engineer role where I can leverage existing skills and learn new ones.
TECHNICAL SKILLS:
Programming languages: Python, Scala, C, C++, Java
RDBMS/NoSQL: MySQL, PostgreSQL, HBase, DynamoDB
Hadoop Ecosystem: Hive, Pig, Sqoop, oozie, Impala, MapReduceFlume
Apache Spark: Spark Core, Data frames, SparkSQL, Spark Streaming, Kafka
Operating systems: Linux, Windows.
Machine Learning: Scikit learn, Numpy, Pandas, linear regression, polynomial regression, KNN, Logistic regression, KNN, Naive Bayes, LDA, PCA, Xgboost, Adaboost.
Visualization tools: Tableau, Quick sight
ETL tools: Talend
Cloud Technologies: AWS, Google Cloud Platform (GCP)
WORK HISTORY:
Confidential
AWS Bigdata Engineer
Responsibilities:
- In-depth Understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Name Node, Data Node, and MapReduce concepts
- Migrated the existing data from mainframes/SQL server to AWS Hadoop/EMR cluster and perform ETL jobs on it
- Implemented scalable data pipelines in spark (Scala) to perform data transfer, aggregation, transformation mining and passed the data to S3 and Redshift
- Developed prototype to offer the order history function on user mobile app using kinesis data streams and dumped the data into dynamo DB
- Implemented the transaction alarm if any unexpected orders has placed and eventually alarm the user by using use Kinesis data streams and Kinesis data analytics to monitor our incoming orders and use a lambda to function to fire off alarms using Amazon SNS to user cell phone when something unusual happens
- Store the log servers and order history in Redshift and used Quick sight for analysis and created different visualizations
- Developed the prototype of analysis of server log data using kinesis firehose to pump the data into the Amazon Elastic Search and perform analysis using Kibana
- Implemented the Amazon machine learning algorithm to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 data lake
Confidential
Hadoop/Spark Developer
Responsibilities:
- Developing spark programs using Scala API for faster testing and processing of Data
- Transforming and retrieving the data by using Spark, Impala, Pig, Hive
- Imported/Exported Data from AWS S3 into spark RDD and perform transformations and actions on RDD's
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, the correct level of Parallelism and memory tuning
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and others during the ingestion process itself
- Responsible for developing a data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
- Used SQOOP to import/export data from various RDBMS to HDFS cluster and Hive tables and designed daily SQOOP incremental jobs into tables
- Involved in Creating hive tables loading with data and perform different HQL queries
- Worked with various file formats like Avro, Parquet and ORC, sequence file and various compression formats like snappy
- Developed data pipeline using Flume, Sqoop, pig to ingest customer behavioral data and purchase histories into HDFS for analysis