Sr.data Engineer Resume
Jacksonville, FL
SUMMARY
- Over 8 years of technical IT experience in all phases of Software Development Life Cycle (SDLC) wif skills in data analysis, design, development, testing and deployment of software systems.
- Designed and implemented data ingestion techniques for data coming from various data sources.
- 2+ Years of experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data factory.
- Hands on experience in Big data Hadoop Ecosystem components such as Spark, Hive, Sqoop,Kafka Flume, Zookeeper,Hbase and MapReduce.
- Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, Hiveql for Hiveand Hbase.
- Experience in importing and exporting data into HDFS and Hive using Sqoop.
- Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map - side joins on RDD's.
- Experience in working wif Teradata and making the data to be batch processing using distributed computing.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Experience wif distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
- Hands on experience in creating data pipeline using Kafka, flume and Storm for Security Fraud and Compliance Violations use cases.
- Good noledge in using ApacheNiFi to automate the data movement between different Hadoop systems.
- Experienced in loading data to hive partitions and creating buckets in Hive.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
- Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Experience in Server infrastructure development on AWS Cloud, extensive usage of Virtual Private Cloud (VPC), IAM, EC2, RDS, S3, SQS, SNS, and Lambda focusing on high-availability, fault tolerance, and auto-scaling wif Cloud watch monitoring.
- Experience in the design and implementation of Continuous Integration, Continuous Delivery, Continuous Deployment and DevOps processes for Agile projects Using Jenkins, Git, Docker.
- Experienced wif performing analytics on Time Series data using HBase and Java API.
- Ability to tune Big Data solutions to improve performance and end-user experience.
- Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
- Involved in creating Hive tables, loading wif data and writing Hive Adhoc queries dat will run internally in MapReduce and TEZ.
- Knowledge of job workflow management and coordinating tools like Oozie.
- Experience working wif NoSQL database technologies, including MongoDB, Cassandra and HBase.
- Strong understanding of Logical and Physical data base models and entity-relationship modeling.
- Worked in Agile methodology and used JIRA for Development and tracking the project.
- A good development experience wif Agile Methodology, SDLC and Water fall methodology.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Hands on Experience in VPN, Putty, winSCP etc. Responsible for creating Hive tables based on business requirements.
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR, Azure
IDE’s: Eclipse, IntelliJ, Spyder, Jupyter
OperatingSystems: Windows, Linux
Programming languages: Python, Scala, Linux shell scripts,PL/SQL, C, C++, Java
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server,DB2 Netezza, HBASE, Hive, MongoDB, Postgres, Azure SQL Datawarehouse
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX
Business Tools: Tableau, Crystal Reports, Dashboard Design, WebI Rich Client
PROFESSIONAL EXPERIENCE
Confidential - Jacksonville, FL
Sr.Data Engineer
Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Create and maintain optimal data pipeline architecture in cloud Microsoft Azure.
- Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL Azure Data Lake Analytics.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive table.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed ADF Pipelines to load data from on prem to AZURE cloud Storage and databases.
- Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage.
- Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and 'big data' technologies like Hadoop Hive, Azure Data Lake storage.
- Migrated projects from Cloudera Hadoop Hive storage to Azure Data Lake Store to satisfy Confidential Digital transformation strategy.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Building distributed, reliable and scalable data pipelines to ingest and process data in real-time using Spark-Streaming and Kafka.
- Developed Kafka consumer API in Scala for consuming data from Kafka topics and loaded into DB2 database.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Developed Sqoop jobs to import data in Avro file format from Oracle database to HDFS and created hive tables on top of it.
- Developed shell scripts to automate the control M jobs depends on frequency like daily, weekly, monthly and annual.
- Deployed the applications on production servers using Jenkins as continuous deployment tool.
- Provided the on-call support for the application after the postproduction to solve the data issues and troubleshoot the hadoop ecosystem run time issues.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Collected the Json data from HTTP Source and developed Spark APIs dat halps to do inserts and updates in Hive tables.
- Preparing the detailed software application architecture and technical design documents as part of SDLC process.
- Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
- Responsible for defining, developing, and communicating key metrics and business trends to partner and management teams.
- Containerize data wrangling jobs in Docker containers utilizing Git and Azure DevOps for version control.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, S3, Kafka, Spark, Scala, Java, Azure, Jenkins, EMR, GitHub, Azure Data Factory, Azure Data Lake storage.
Confidential - Minneapolis, MN
Data Engineer
Responsibilities:
- Developed Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
- Created the Hive tables based on partitioning and scheduling the data ingestion jobs.
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
- Used AWS Glue for the data transformation, validate and data cleansing.
- Transformed the data received from source systems using scalaand python as per business transformations and saved the data into the files to load into hive.
- Developed shell scripts to automate the daily jobs and weekly jobs.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Used Map Reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Used JSON schema to define table and column mapping from S3 data to Redshift.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- The Hive tables are created as per requirement were Internal or External tables defined wif appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.
- Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
- Performed querying of both managed and external tables created by Hive using Impala.
Environment: Hadoop 2, HDFS, Spark 2.2, Scala, AWS, S3, Redshift, python, Ec2, Java, Kafka, Hive, HiveQL, Oozie, Sqoop, Impala, Git, HBase.
Confidential - St. Louis, MO
Big Data Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
- Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
- Developed Spark API to import data into HDFS from Teradata and create Hive tables.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map-Reduce Programs those are running on the cluster.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
- Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
- Involved in creating Hive tables, loading wif data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported data from Teradata to HDFS thru Informatica maps using Unix scripts.
Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper.
Confidential
SQL Developer
Responsibilities:
- Optimized SQL queries for improved performance and availability.
- Created backup and restore operations.
- Optimized long running stored procedures and queries for effective data retrieval.
- Performed analysis and presented results using SQL, SSIS, MS Access, Excel, and Visual Basic scripts.
- Used SSIS dataflow such as using Derived column, Data Conversion, Merge joins, Union All, Conditional split, lookup and row count.
- Created or modified the ETL processes to achieve the daily/monthly loading requirements.
- Involved in performance tuning and optimizing for SQL server application by using performance monitor.
- Created views and business logics to reduce the database complexities for easily reporting purposes.
- Wrote and automated tools and scripts to increase departmental efficiency and automate repeatable tasks.
- Supports and assist QA Engineers in understanding,testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production system.
- Responsible to manage data coming from different sources.
Environment: SQL, SSIS, ETL, DDL, MS SQL server 2012/R2/2008, Toad for DB2.