We provide IT Staff Augmentation Services!

Sr.data Engineer Resume

5.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY

  • Over 8 years of technical IT experience in all phases of Software Development Life Cycle (SDLC) wif skills in data analysis, design, development, testing and deployment of software systems.
  • Designed and implemented data ingestion techniques for data coming from various data sources.
  • 2+ Years of experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data factory.
  • Hands on experience in Big data Hadoop Ecosystem components such as Spark, Hive, Sqoop,Kafka Flume, Zookeeper,Hbase and MapReduce.
  • Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, Hiveql for Hiveand Hbase.
  • Experience in importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map - side joins on RDD's.
  • Experience in working wif Teradata and making the data to be batch processing using distributed computing.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
  • Experience wif distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
  • Hands on experience in creating data pipeline using Kafka, flume and Storm for Security Fraud and Compliance Violations use cases.
  • Good noledge in using ApacheNiFi to automate the data movement between different Hadoop systems.
  • Experienced in loading data to hive partitions and creating buckets in Hive.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Experience in Server infrastructure development on AWS Cloud, extensive usage of Virtual Private Cloud (VPC), IAM, EC2, RDS, S3, SQS, SNS, and Lambda focusing on high-availability, fault tolerance, and auto-scaling wif Cloud watch monitoring.
  • Experience in the design and implementation of Continuous Integration, Continuous Delivery, Continuous Deployment and DevOps processes for Agile projects Using Jenkins, Git, Docker.
  • Experienced wif performing analytics on Time Series data using HBase and Java API.
  • Ability to tune Big Data solutions to improve performance and end-user experience.
  • Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
  • Involved in creating Hive tables, loading wif data and writing Hive Adhoc queries dat will run internally in MapReduce and TEZ.
  • Knowledge of job workflow management and coordinating tools like Oozie.
  • Experience working wif NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Strong understanding of Logical and Physical data base models and entity-relationship modeling.
  • Worked in Agile methodology and used JIRA for Development and tracking the project.
  • A good development experience wif Agile Methodology, SDLC and Water fall methodology.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Hands on Experience in VPN, Putty, winSCP etc. Responsible for creating Hive tables based on business requirements.
  • Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR, Azure

IDE’s: Eclipse, IntelliJ, Spyder, Jupyter

OperatingSystems: Windows, Linux

Programming languages: Python, Scala, Linux shell scripts,PL/SQL, C, C++, Java

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server,DB2 Netezza, HBASE, Hive, MongoDB, Postgres, Azure SQL Datawarehouse

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX

Business Tools: Tableau, Crystal Reports, Dashboard Design, WebI Rich Client

PROFESSIONAL EXPERIENCE

Confidential - Jacksonville, FL

Sr.Data Engineer

Responsibilities:

  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Create and maintain optimal data pipeline architecture in cloud Microsoft Azure.
  • Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL Azure Data Lake Analytics.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive table.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed ADF Pipelines to load data from on prem to AZURE cloud Storage and databases.
  • Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage.
  • Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and 'big data' technologies like Hadoop Hive, Azure Data Lake storage.
  • Migrated projects from Cloudera Hadoop Hive storage to Azure Data Lake Store to satisfy Confidential Digital transformation strategy.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Building distributed, reliable and scalable data pipelines to ingest and process data in real-time using Spark-Streaming and Kafka.
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics and loaded into DB2 database.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database to HDFS and created hive tables on top of it.
  • Developed shell scripts to automate the control M jobs depends on frequency like daily, weekly, monthly and annual.
  • Deployed the applications on production servers using Jenkins as continuous deployment tool.
  • Provided the on-call support for the application after the postproduction to solve the data issues and troubleshoot the hadoop ecosystem run time issues.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Collected the Json data from HTTP Source and developed Spark APIs dat halps to do inserts and updates in Hive tables.
  • Preparing the detailed software application architecture and technical design documents as part of SDLC process.
  • Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
  • Responsible for defining, developing, and communicating key metrics and business trends to partner and management teams.
  • Containerize data wrangling jobs in Docker containers utilizing Git and Azure DevOps for version control.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, S3, Kafka, Spark, Scala, Java, Azure, Jenkins, EMR, GitHub, Azure Data Factory, Azure Data Lake storage.

Confidential - Minneapolis, MN

Data Engineer

Responsibilities:

  • Developed Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
  • Created the Hive tables based on partitioning and scheduling the data ingestion jobs.
  • Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
  • Used AWS Glue for the data transformation, validate and data cleansing.
  • Transformed the data received from source systems using scalaand python as per business transformations and saved the data into the files to load into hive.
  • Developed shell scripts to automate the daily jobs and weekly jobs.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Used Map Reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Used JSON schema to define table and column mapping from S3 data to Redshift.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • The Hive tables are created as per requirement were Internal or External tables defined wif appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
  • Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.
  • Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
  • Performed querying of both managed and external tables created by Hive using Impala.

Environment: Hadoop 2, HDFS, Spark 2.2, Scala, AWS, S3, Redshift, python, Ec2, Java, Kafka, Hive, HiveQL, Oozie, Sqoop, Impala, Git, HBase.

Confidential - St. Louis, MO

Big Data Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
  • Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
  • Developed Spark API to import data into HDFS from Teradata and create Hive tables.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map-Reduce Programs those are running on the cluster.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Involved in creating Hive tables, loading wif data and writing hive queries, which will run internally in map, reduce way.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Imported data from Teradata to HDFS thru Informatica maps using Unix scripts.

Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper.

Confidential

SQL Developer

Responsibilities:

  • Optimized SQL queries for improved performance and availability.
  • Created backup and restore operations.
  • Optimized long running stored procedures and queries for effective data retrieval.
  • Performed analysis and presented results using SQL, SSIS, MS Access, Excel, and Visual Basic scripts.
  • Used SSIS dataflow such as using Derived column, Data Conversion, Merge joins, Union All, Conditional split, lookup and row count.
  • Created or modified the ETL processes to achieve the daily/monthly loading requirements.
  • Involved in performance tuning and optimizing for SQL server application by using performance monitor.
  • Created views and business logics to reduce the database complexities for easily reporting purposes.
  • Wrote and automated tools and scripts to increase departmental efficiency and automate repeatable tasks.
  • Supports and assist QA Engineers in understanding,testing and troubleshooting.
  • Written build scripts using ant and participated in the deployment of one or more production system.
  • Responsible to manage data coming from different sources.

Environment: SQL, SSIS, ETL, DDL, MS SQL server 2012/R2/2008, Toad for DB2.

We'd love your feedback!