We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • 5 years of hands - on experience in Big Data technologies and Machine Learning algorithms which comprises of highly distributive and massive amount of data using MapR and Cloudera Hadoop distributions.
  • Solid understanding of Hadoop and YARN architecture, working of Hadoop framework involving Hadoop Distributed File System and technologies like MapReduce, Pig, Hive, HBase, Flume, Sqoop, Zoo Keeper and Oozie, Storm, Spark, Kafka.
  • Worked on real time data integration using Kafka data pipeline, Spark streaming wif NoSQL databases like HBase, Cassandra and Mongo DB.
  • Experienced in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Extensive Knowledge on developing Spark Streaming jobs by using RDD's (Resilient Distributed Datasets) and leverage PySpark and Spark-Shell accordingly. Good noledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Familiarity wif Amazon Web Services along wif provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2 instances, RDS and others.
  • Implemented machine learning models such as Random Forests (Classification), K-Means Clustering, KNN (K-nearest neighbors), Naive Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
  • Experienced in all phases of Software Development Life Cycle (SDLC). Worked in Agile environment.

TECHNICAL SKILLS:

OPERATING SYSTEMS: Windows/UNIX

LANGUAGES: Python, Scala, Java, Shell, SQL

RELATIONAL DATABASES: MySQL, Oracle, SQL Server

NoSQL DATABASES: Cassandra, AWS DynamoDB

VERSION CONTROL: GIT, SVN

CLOUD: AWS EMR, AWS EC2, AWS S3, RDS

Big Data Ecosystem: HDFS, PIG, MapReduce, YARN, Hive, Sqoop, Flume, Oozie, HBase, Apache

DATA INGESTION: Sqoop, Kafka, Flume

DATA PROCESSING: Spark, Hive, MapReduce

MACHINE LEARNING: Spark MLib, TensorFlow, scikit learn, Keras, NLTK

Web Technologies: HTML, XML, JavaScript, jQuery

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Big Data Engineer

Responsibilities:

  • Designed and implemented big data ingestion pipelines to ingest data from various data source using Kafka, Spark streaming including data quality checks, transformation, and stored as efficient storage formats. Stored teh stream data to HDFS using Scala and Cassandra.
  • Documented teh requirements including teh available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Built models using Statistical techniques and Machine Learning classification models like XG Boost, SVM, and Random Forest. Developed and design advanced predictive analysis models. Model and frame business scenarios dat are meaningful and impact critical business processes and/or decisions.
  • Developed Scala scripts to read all teh Parquet tables in a Database and parse them as Json files.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, MySQL) for predictive analytics.

Technologies Used: Scala, Python, PySpark, HDFS, HBase, Cassandra, REST API, Hive, Pig, Pandas, NumPy, Unix shell, Apache Spark, Kafka, AWS EC2, S3, Redshift, EMR, Elasticsearch.

Confidential

Big Data Engineer

Responsibilities:

  • Worked on processing big volumes of data using different big data analytic tools like Hive, SQOOP, Pig, Flume, OOZIE, CDH5, HBase, Scala.
  • Using SQOOP to import and export data from relational data source, MySQL into HDFS.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and semi - structured data.
  • Used Apache Solr to search for specific products each cycle for teh business.
  • Designing and documenting teh project use cases, writing test cases, leading offshore team, andinteracting wif client.
  • Used Git as version control to checkout and check-in of files
  • Load data to External tables by using Hive Scripts.
  • Performed aggregate Joins, transformation using Hive queries.
  • Implemented Partitions, Dynamic Partitions, Buckets in Hive.
  • Optimized HIVE SQL queries and thus improved teh job performance.
  • Developed Sqoop scripts to import and export teh data from relational sources and handled incremental loading on teh customer and transaction data by date.
  • Performed Hadoop cluster environment administration dat includes adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting.
  • Written Unit Test Cases for Hive Scripts.

We'd love your feedback!