We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • Over 10+ years of extensive hands - on Big Data Capacity wif the help of Hadoop Eco System across internal and cloud-based platforms.
  • Expertise inCloud Computing and Hadooparchitecture and its various components -HadoopFile System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
  • Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
  • Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.
  • Good Knowledge onHadoopCluster architecture and monitoring.
  • Extensive Experience on importing and exporting data using Kafka.
  • Experience in Spark, Scala, Kafka.
  • Hands on experience of usingHadoopecosystem components like Oozie, Hive, Sqoop, Flume, HUE, Impala.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
  • Experience in developing and scheduling ETL workflows in Hadoop usingOoziewif the help of deployment and managing Hadoop cluster using Cloudera and Hortonworks.
  • Experience in importing data from a relational database to Hive metastore using Sqoop.
  • Experience creating managed and external tables in Hive.

TECHNICAL SKILLS:

Big Data Ecosystems: Spark, HDFS and Map Reduce, Pig, Hive, Pig, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume

Cloud Technologies: Google Cloud Platform, Pub/Sub, Dataflow, Big Query

Scripting Languages: Python, shell

Programming Languages: Python, Java, Scala

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Big Data Engineer

Responsibilities:

  • Performed complex day-to-day development/support for GCPcloudenvironment
  • Developed, maintain and tuned highly complex scripts using Python and Big Query
  • Managed and designed solutions related to Data Engineering, Data Analysis, Data modelling, Data Warehouse, Data security, ETL.
  • Implemented parsing and interpreter for migrating data from IBM mainframe to Google Big Query will be a big advantage.
  • Building data integration and preparation tools using cloud technologies like Google Dataflow, Cloud Data prep, Python, etc.
  • Assisted wif Google Cloud Platform (GCP) operations
  • Implemented CI/CD Pipelines initiating code builds from repositories and orchestrated deployment wifin Google Kubernetes Engine (GKE) containers
  • Configured native security tools available wifin Google Cloud Platform

Confidential, Minneapolis, MN

Sr. Big Data Engineer / Cloud Engineer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Designed and implemented end to end big data platform on Teradata Appliance
  • Performed ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
  • Worked on Apache Spark Utilizing the Spark, SQL and Streaming components to support the intraday and real-time data processing
  • Developed Python, Bash scripts to automate and provide Control flow
  • Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
  • Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
  • Building data pipeline ETLs for data movement to S3, then to Redshift.
  • Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
  • Installed and configured apache airflow for workflow management and created workflows in python
  • Wrote UDFs in Hadoop PySpark to perform transformations and loads.
  • Wrote TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
  • Working wif, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query

Confidential, Denver, CO

Sr. data analyst/ Big Data Engineer

Responsibilities:

  • Used Hive Queries in Spark-SQL for analysis and processing the data
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters
  • Written shell scripts dat run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data
  • Involved in business analysis and technical design sessions wif business and technical staff to develop requirements document and ETL design specifications.
  • Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the volume of the data was heavy.
  • Responsible for design, development, Data Modelling, of Spark SQL Scripts based on Functional Specifications
  • Designed and developed extract, transform, and load (ETL) mappings, procedures, and schedules, following the standard development lifecycle
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Worked closely wif Quality Assurance, Operations and Production support group to devise the test plans, answer questions and solve any data or processing issues
  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, GCP, Sqoop, Hive and NoSQL databases
  • Worked in writing Spark SQL scripts for optimizing the query performance
  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming
  • Implemented Hive UDF's and did performance tuning for better results
  • Tuned, and developed SQL on HiveQL, Drill and SparkSQL
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE
  • Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data
  • Implemented Partitioning, Data Modelling, Dynamic Partitions and Buckets in HIVE for efficient data access

Confidential

Sr. Hadoop/Spark Developer

Responsibilities:

  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs wif incremental load to populate Hive External tables.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Integrated Oozie wif the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Created Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Confidential

Data Analyst / Hadoop Developer

Responsibilities:

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
  • Configured, designed implemented and monitored Kafka cluster and connectors.
  • Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
  • Used Sqoop to import data into HDFS and Hive from multiple data systems.
  • Developed complex queries using HIVE and IMPALA.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
  • Helped wif the sizing and performance tuning of the Cassandra cluster.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
  • Used Hive on Tez to transform the data as per business requirements for batch processing.
  • Developed multiple Poc's using Spark and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
  • B.S. in Computer Science - Punjab Technical University ( )

We'd love your feedback!