We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

Denver, CO

SUMMARY

  • About 8 years of overall professional IT experience and over 5 years of expertise in Bigdata using Hadoop framework Analysis, Design, Development, Documentation, Deployment, and Integration using SQL and Big Data technologies.
  • Solid knowledge of Hadoop architecture and its various components such as HDFS, Yarn, Mapreduce, Hive, Pig, HBase, Kafka, Oozie etc.,
  • Expertise in implementing various Big Data Analytical, Cloud Data engineering, and Data Warehouse / Data Mart, Data Visualization, Reporting, Data Quality, and Data virtualization solutions.
  • Good working knowledge of real time streaming pipelines using Kafka and Spark - Streaming.
  • Experience in working wif AWS, Azure, and Google data services
  • Redesigned teh Views in snowflake to increase teh performance.
  • Experience in moving Teradata and Hadoop data objects of both high and low volume to Snowflake.
  • Practical knowledge of many programming languages, including Java, Python, and Scala. Experience in how to use technologies from teh Hadoop eco system, including HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, HBase, Kafka, and Crontab.
  • Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats.
  • Experience in end-to-end implementation of a project likeData Lake.
  • Experience wif different file formats likeORC, Parquet, AVRO, JSON.
  • Worked on designing, building, deploying and maintaining MongoDB.
  • Experience working wif cloud tools likeAmazon Web ServicesandAzure
  • Experience developing On-premises and Real Time Process.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, Oracle, and MS SQL Servers. Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
  • Worked in multiple Hadoop distributions likeHortonWorks, AWS, Cloudera and MapR.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX, Pyspark, Scala.
  • Optimized HiveQL and Spark SQL queries and applications by creating partitions and bucketing respectively based on use case of better performance.
  • Performance tuning for Spark applications adjusting partitions (shuffle dynamic Partitions), caching, using broadcast variables, accumulators etc.
  • Experience in creating PowerBi Dashboards, charts and Tableau.

TECHNICAL SKILLS

Big Data Technologies: MapReduce, HBase, HDFS, Sqoop, Spark, Hadoop, Hive, PIG, Impala

Cloud Architecture: AWS EC2, Elastic Search, Elastic Load Balancing & Azure.

Databases: Oracle, Snowflake (Cloud), SQL Server, MySQL, HBase, MongoDB, DynamoDB and Elastic Cache

OLAP tools: Tableau, SAP BO, SSAS, Business Objects & Crystal Reports

Operating System: Linux, Unix and Windows

Web Technologies: HTML, CSS, Java Script, XML, Restful

Tools: and IDE: Eclipse, Maven, ANT, DB Visualizer

Languages: C, C++, Java, Python, SQL, HiveQL

Web Application Servers: Apache Tomcat, Weblogic, JBoss

Azure Components: Azure Data factory, Databricks and Azure SQL.

PROFESSIONAL EXPERIENCE

Confidential - Denver, CO

Sr. Data Engineer

Responsibilities:

  • Using PowerBI for reporting and migrating an entire Oracle Database to BigQuery.
  • Experience in building PowerBi reports on Azure Analysis services for better performance.
  • Hands on experience wif building data pipelines in python/Pyspark/HiveSQL/Presto.
  • Worked wif Google data catalog and other Google cloud APIs for monitoring, query and billing related analysis for BigQuery usage.
  • Carried out data transformation and cleansing using SQL queries, Python and Pyspark.
  • Involved in importing real time data toHadoopusingKafkaand implemented theOoziejob for daily imports.
  • Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery by Coordinating wif team
  • Knowledge in creating Spark applications wif Spark-SQL in Databricks for data extraction, transformation, and aggregation from various file formats for analysis & transformation of teh data to reveal consumer usage patterns.
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming wif Kafka as a data pipeline system using Scala programming.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala toperform ETL jobswif ingested data.
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Designed and Co-ordinated wif Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
  • Wrote scripts in Hive SQL for creating complex tables wif high performance metrics like partitioning, clustering and skewing
  • Good knowledge in using cloud shell for various tasks and deploying services.
  • Created BigQuery authorized views for row level security or exposing teh data to other teams.
  • Created Databricks notebooks using SQL, Python and automated notebooks using jobs
  • Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats (Avro, Json, XML, Flat files)
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, wif Cloudera Distribution.
  • Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
  • Worked wif google data catalog and other google cloud APIs for monitoring, query and billing related analysis for BigQuery usage.
  • Worked on creating POC for utilizing teh ML models and Cloud ML for table Quality Analysis for teh batch process.
  • Knowledge about cloud dataflow and Apache beam

Environment: RedShift, Spark, Hive, Sqoop, Oozie, HBase, Scala, MapReduce, Azure, Teradata, SQL, python, R Studio, Excel, Power Point, Tableau Hadoop, PySpark, random forest, Apache Airflow, Databricks.

Confidential - Atlanta, GA

Bigdata Hadoop Developer

Responsibilities:

  • Performed spark streaming and batch processing using Scala.
  • Used Hive in Spark for data cleansing and transformation.
  • Used Scala and Kafka to create data pipelines for structuring, processing, and transforming given data.
  • Responsible for building scalable distributed data solutions using EMR cluster environment wif Amazon EMR 5.6.1.
  • Performed teh migration of Hive and MapReduce Jobs from on - premises MapR to AWS cloud using EMR and Qubole.
  • Experience in data integration and modeling.
  • Implemented Performance testing using Apache JMeter and created a Dashboard using Grafana to view teh Results.
  • Participate in creating state-of-teh-art data and analytics driven solutions, developing, and deploying cutting edge scalable algorithms, working across GE to drive business analytics to a new level of predictive analytics while leveraging big data tools and technologies.
  • Develop real-time data feeds and microservices leveraging AWS Kinesis, Lambda, Kafka, Spark Streaming, etc. to enhance useful analytic opportunities and influence customer content and experiences.
  • Used AWS glue catalog wif crawler to get teh data from S3 and perform SQL query operations.
  • Strong knowledge of various data warehousing methodologies and data modeling concepts.
  • Heavily involved in testing Snowflake to understand teh best possible way to use teh cloud resources.
  • Performed efficient load and transform Spark code using Python and Spark SQL.
  • To meet specific business requirements wrote UDF’s inScalaandPyspark.
  • Developed JSON Scripts for deploying teh Pipeline in Azure Data Factory (ADF) that processes teh data using teh SQL Activity.
  • Experience on Migrating SQL Database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, DataBricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data factory
  • Identify and utilize existing tools and algorithms from multiple sources to enhance confidence in teh assessment of various targets.
  • Worked on ETL testing and used SSIS tester automated tool for unit and integration testing.
  • Designed and created SSIS/ETL framework from teh ground up.
  • Managed huge volume of structured, semi structured, and unstructured data.
  • Used Oozie to create big data workflows for ingesting data from various sources to Hadoop.
  • Developed Spark jobs using Scala for better data processing and used Spark SQL for querying.

Environment: AWS, Pig, SBT, SQOOP, Maven, Zookeeper, HDFS, Spark, Scala, Pyspark, ADF, Kafka

Confidential - OH

Spark/Hadoop Developer

Responsibilities:

  • Developed multiple Spark jobs in PySpark for data cleaning and Pre-processing.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Wrote teh PIG UDF for converting Date format and time stamp formats from teh unstructured files to required date formats and processed teh same.
  • Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating teh tables.
  • Written teh Apache PIG scripts to process teh HDFS data.
  • Created Hive tables to store teh processed results in a tabular format.
  • Involved in creating Hive tables and loading and analyzing data using Hive queries.
  • Developed simple/complex MapReduce jobs using Hive and Pig.
  • Migrated existing MapReduce programs to Spark using Scala and Python.
  • Accountable for measuring teh Spark Databricks cluster's growth, maintaining it, and solving problems.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked wif application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
  • Developed teh Sqoop scripts to make teh interaction between Pig and SQL Database.
  • Writing teh script files for processing data and loading to HDFS
  • Writing CLI commands using HDFS.
  • Developed teh UNIX shell scripts for creating reports from Hive data.
  • Completely involved in teh requirement analysis phase.
  • Moved all log/text files generated by various products into HDFS location.
  • Created External Hive Table on top of parsed data.

Environment: Sqoop, Linux, XML, PL/SQL, SQL connector, CDH4, MapReduce, HDFS, Hive, Pig, Databricks

Confidential

Data Engineer

Responsibilities:

  • Analyzed teh requirements provided by teh client and developed a detailed design wif teh team.
  • Worked wif teh client team to confirm teh design and modify it based on teh changes mentioned.
  • Involved in extracting and exporting data from DB2 into AWS for analysis, visualization, and report generation.
  • Created HBase tables and columns to store teh user event data.
  • Used Hive and Impala to query teh data in HBase.
  • Developed and implemented core API services using Scala and Spark.
  • Managed querying teh data frames using Spark SQL.
  • Used Spark data frames to migrate data from AWS to MySQL.
  • Built continuous ETL pipeline by using Kafka, Spark streaming and HDFS.
  • Performed ETL on data from various file formats (JSON, Parquet, and Database).
  • Performed complex data transformations using Scala in Spark.
  • Converted SQL queries to Spark transformations using Spark RDDs and Scala.
  • Worked on importing real time data to Hadoop using Kafka and implemented Oozie job.
  • Collected log data from web servers and exported to HDFS.
  • Involved in defining job flows, management, and log files reviews.
  • Installed Oozie workflow to run Spark, Pig jobs simultaneously.
  • Created hive tables to store teh data in table format.

Environment: Oozie, SQOOP, Zookeeper, MySQL, HBase, Spark, Scala, HDFS, SQL

Confidential

Software Engineer

Responsibilities:

  • Involved in Analysis, Design, Development, System Testing and User Acceptance Testing. Successfully followed agile methodology in teh Scrum model.
  • Daily scrums, bug debugging and repairing, server monitoring, research, design, and implementation of new features as required by teh product team are among teh day-to-day responsibilities.
  • Used Spring MVC framework to develop applications.
  • Used Hibernate to establish connectivity to teh database.
  • Writing detailed design documentation based on teh requirements.
  • Developed Business classes using Spring POJO.
  • Design and Development of Objects using Object Oriented Design in Java.
  • Used Spring framework for teh MVC implementation wif spring for DI and Hibernate as ORM tool for database communication.
  • Developed web services using HTTP, XML technologies.
  • Used Git as source control management, giving centralized programs that must communicate wif a server a significant speed advantage
  • Developed teh application usingJava8 and implemented its features like lambdas Expressions, Time API, Streams, functional interfaces, collectors, default methods, type interfaces, for each.
  • Used Spring Core, Spring Web MVC, Spring ORM, Spring JDBC, DAO, Spring AOP.
  • Involved in creating and Deployment of REST API and Micro Services in Java J2EEE using Spring Boot.
  • Interacted wif Business Users and Analysts for requirements gathering for all teh use case implementations.

Environment: Hibernate, Maven, GIT, Oracle, Tomcat, WebLogic, Windows, Java, Spring IOC, Spring MVC, JavaScript

We'd love your feedback!