We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

0/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • Over 8+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer.
  • Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS
  • Setting up Azure infrastructure likestorage accounts, integration runtime, service principalid, app registrations to enablescalable and optimizedutilization of business user analytical requirements in Azure.
  • Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
  • Exploratory Data Analysis and Data wrangling with R and Python.
  • Have good knowledge on NoSQL databases like HBase,Cassandra and MongoDB.
  • Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage and Composer.
  • Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice - versa and load into Hive tables, which are partitioned.
  • Good knowledge in streaming applications using Apache Kafka.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
  • Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase
  • Designed and executed Oozie workflows in a manner that allowed for scheduling Sqoop and Hive job actions to extract, transform and load data
  • Migrate databases to cloud platform SQL Azure and as well the performance tuning.
  • Experienced on Hadoop/Hive on AWS, using both EMR and non-EMR-Hadoop in EC2.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive usingHBase-Hive Integration
  • Implemented sentiment analysis and text analytics on Twitter social media feeds and market news using Scala and Python.
  • Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
  • Hands on experience in installing, configuring Cloudera ApacheHadoopecosystem components like Flume, Hbase, Zookeeper, Oozie, Hive, Sqoop and Pig.
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked with real-time data processing and streaming techniques using Spark streaming and Kafka
  • Pipeline development skills with Apache Airflow, Kafka, and NiFi
  • Extensively using open source languages Perl,Python,Scala and Java.
  • Migrated projects from Cloudera Hadoop Hive storage to Azure Data Lake Store to satisfy Confidential Digital transformation strategy
  • Doing data synchronization between EC2 and S3, Hive stand-up, and AWS profiling.
  • Using Spark Data frame API in Scala for analyzing data.

TECHNICAL SKILLS

Big Data Ecosystem: Map Reduce, HDFS, HIVE, HBase, Pig, Sqoop, Impala, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Platform: Amazon Web Services (AWS), MS Azure

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive, and MongoDB

Version Control: GIT, Git Lab, SVN

Programming Languages: Java, Python, SQL, PL/SQL, AWS, Hive QL, UNIX Shell Scripting, Scala.

Software Development: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

PROFESSIONAL EXPERIENCE

Confidential, Nashville, TN

Senior Data Engineer

Responsibilities:

  • Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
  • Created yaml files for each data source and including glue table stack creation
  • Extensive experience on Hadoope cosystem components likeHadoop, Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, ZooKeeper and Flume.
  • Developed real time SLA monitoring dashboards in Tableau for the Kafka messages load in Sap HANA
  • Proposed an automated system using Shell script to Sqoop the job.
  • Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud migration, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
  • Developed Spark jobs using Scala for faster real-time analytics and used Spark SQL for querying
  • Data acquisition from REST API / json; data wrangling wifi Python and unix tools; segment and organize data from disparate sources and data loading to Google Big Query
  • Implemented data ingestion and handling clusters in real time processing usingKafka.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic Map Reduce.
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
  • UsedZookeeperto provide coordination services to the cluster. Experienced in managing and reviewingHadooplog files.
  • Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pubsub, Airflow.
  • Hands on porting the existing on-premise Hive code migration to GCP (Google Cloud Platform) BigQuery
  • Used Sqoop to channel data from different sources of HDFS and RDBMS.
  • Extensive Experience on importing and exporting data using stream processing platforms likeFlumeandKafka.
  • Using g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket.
  • Deep analytics and understanding of Big Data and algorithms using Hadoop, Map Reduce, NoSQL and distributed computing tools.
  • Developed Oozie workflowschedulers torun multiple Hive and Pig jobsthat run independently with time and data availability.
  • Process and load bound and unbound Data from Google pub/sub topic to Big query using cloud Dataflow with Python.
  • Experienced in troubleshooting errors in HBase Shell/API, Pig, Hive and map Reduce.
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
  • Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver.
  • Used HBase/Phoenix to support front end applications that retrieve data using row keys
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
  • Imported documents into HDFS, HBase and creating HAR files.
  • Experience in usingZookeeperandOozieoperational services to coordinate clusters and scheduling workflows
  • Involved in creating Hive QL on HBase tables and importing efficient work order data into Hive tables
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Experience inCloud computing on Google Cloud Platformwith various technology like Dataflow, Pub/Sub, Big Query and all related tools.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Executing parameterized Pig, Hive, impala, and UNIX batches in Production.

Environment: Big Data, Hadoop, Oracle, Pl/Sql, Scala, Spark-Sql, PySpark, Python, Kafka, Oozie, SSIS, T-SQL, ETL, Hdfs, Cosmos, Zookeeper, hive, HBase

Confidential, New York, NY

Azure Big Data Engineer

Responsibilities:

  • Involved in importing real time Jet.com data to HDFS from Kafka and implemented and scheduled hourly runs using automic/uc4.
  • Experienced in usingPlatforaa data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
  • Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management
  • Involved in Importing and exporting data from HDFS using Sqoop, resolution of access issues, performance issues and Patch/upgrade related issues.
  • Installed and configuredHive, Pig, Sqoop, FlumeandOozieon the Hadoop cluster.
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing
  • Developed workflow inOozieto automate the tasks of loading the data intoHDFSand pre-processing withPig.
  • Performed data analysis and developed analytic solutions. Data investigation to discover correlations / trends and the ability to explain them.
  • Developed automation system using PowerShell scripts and JSON templates to remediate the Azure services
  • Scripted all the Hadoop jobs in Shell and Python.
  • Used Spark SQL functions to move data from stage hive tables to fact and dimension tables.
  • Using HBase to store majority of data which needs to be divided based on region.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Developed frameworks and processes to analyze unstructured information. Assisted in Azure Power BI architecture design
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java intoPig Latin and HQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Extract Transform and Load data from Sources Systems to cloud Azure Data Storage services using a combination of Azure Cloud Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Migrated Map reduce jobs to Spark jobs to achieve better performance.
  • Used Hbase/Pheonix to support front end applications that retrieve data using row keys
  • Built Kafka monitoring scripts to monitor Kafka loads into Hadoop cluster.
  • Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Expertise in usingDocker to run and deploy the applications in multiple containers likeDocker SwarmandDocker Wave.
  • Working on designing the Map Reduce and Yarn flow and writing Map Reduce scripts, performance tuning and debugging.
  • Implemented Partitioning, Dynamic Partitions and Buckets inHIVEfor efficient data access.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).

Environment: Hadoop, Map Reduce, HDFS, Hive, Spring Boot, Cassandra, Swamp, Data Lake, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential, Columbia, SC

Data Engineer

Responsibilities:

  • Created functions and assigned roles in AWS Lambda to run python scripts, and AWSLambda using java to perform event driven processing.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design inHadoopand Big Data
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources toHadoopsystems and vice versa.
  • Developed a Python Script to load the CSV files into the S3 buckets and createdAWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future references.
  • Involved in Configuring Hadoop cluster and load balancing across the nodes.
  • Involved in Hadoop installation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Wrote script for Location Analytic project deployment on a Linux cluster/farm &AWSCloud deployment using Python.
  • Worked extensively on Informatica Partitioning when dealing with huge volumes of data.
  • Used Teradata External Loaders like Multi Load, T Pump and Fast Load in Informatica to load data into Teradata database.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Created several types of data visualizations using Python and Tableau. Extracted Mega Data from AWS using SQL Queries to create reports.
  • Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.

Environment: Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Flume, Spark, Impala, Cassandra, Pig, Hdfs, Scala, Spark RDD, Spark Sql, Kafka.

Confidential

Big Data Engineer

Responsibilities:

  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive. Hive on Spark and some through Spark SQL.
  • Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Implemented reporting in PySpark, Zeppelin& querying through Airpal & AWS Athena.
  • Wrote Junit tests and Integration test cases for those Microservice.
  • Work heavily with Python, C++, Spark, SQL, Airflow, and Looker
  • Proven experience with ETL frameworks (Airflow, Luigi, or our own open sourced garcon)
  • Created Hive schemas using performance techniques like partitioning and bucketing.
  • Createddata modelsforAWS Redshiftand Hive fromdimensional data models.
  • Implemented a prototype for the complete requirements using Splunk, python and Machine learning concepts
  • Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu).
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse
  • Actively participated in data mapping activities for the data warehouse.
  • Build machine learning models to showcase Big data capabilities using Pyspark and MLlib.
  • Enhancing Data Ingestion Framework by creating more robust and secure data pipelines.
  • Worked onAmazon AWS conceptslikeEMRandEC2 web servicesfor fast and efficient processing ofBig Data.

Environment: ER/Studio, Python, OLAP, OLTP, Oracle, ETL, SQL, PL/SQL, Teradata, SSIS, SSRS, T-SQL, XML.

Confidential

Data Engineer

Responsibilities:

  • Involved in loading data into HBase NoSQL database.
  • Building, Managing and scheduling Oozie workflows for end-to-end job processing.
  • Worked on Hortonworks-HDP 2.5distribution.
  • Responsible for building-scalable distribution data solution using Hadoop.
  • Used SSIS to transform data into SQL database via FTP from text files, MS Excel as source.
  • Manage security - assign permissions and roles. Designed scripts to automate the maintenance tasks.
  • Built PL/SQL (Procedures, Functions, Triggers, and Packages) to summarize the data to populate summary tables that will be used for generating reports with performance improvement.
  • Involved in importing data from MS SQL Server, MySQL and Teradata into HDFS using Sqoop.
  • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
  • Wrote Hive QL queries for integrating different tables for create views to produce result set.
  • Collected the log data from Web Servers and integrated into HDFS using Flume.
  • Worked on loading and transforming of large sets of structured and unstructured data.
  • Used Map Reduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
  • Worked on extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Analyzing of Large volumes of structured data using Spark SQL.
  • Migrated Hive QL queries into Spark SQL to improve performance.

Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, Spark SQL.

We'd love your feedback!