Sr.Data Engineer Resume Jacksonville, FL - Hire IT People

SUMMARY

Over 8 years of technical IT experience in all phases of Software Development Life Cycle (SDLC) wif skills in data analysis, design, development, testing and deployment of software systems.
Designed and implemented data ingestion techniques for data coming from various data sources.
2+ Years of experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data factory.
Hands on experience in Big data Hadoop Ecosystem components such as Spark, Hive, Sqoop,Kafka Flume, Zookeeper,Hbase and MapReduce.
Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, Hiveql for Hiveand Hbase.
Experience in importing and exporting data into HDFS and Hive using Sqoop.
Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map - side joins on RDD's.
Experience in working wif Teradata and making the data to be batch processing using distributed computing.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
Experience wif distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
Hands on experience in creating data pipeline using Kafka, flume and Storm for Security Fraud and Compliance Violations use cases.
Good noledge in using ApacheNiFi to automate the data movement between different Hadoop systems.
Experienced in loading data to hive partitions and creating buckets in Hive.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
Defined and deployed monitoring, metrics, and logging systems on AWS.
Experience in Server infrastructure development on AWS Cloud, extensive usage of Virtual Private Cloud (VPC), IAM, EC2, RDS, S3, SQS, SNS, and Lambda focusing on high-availability, fault tolerance, and auto-scaling wif Cloud watch monitoring.
Experience in the design and implementation of Continuous Integration, Continuous Delivery, Continuous Deployment and DevOps processes for Agile projects Using Jenkins, Git, Docker.
Experienced wif performing analytics on Time Series data using HBase and Java API.
Ability to tune Big Data solutions to improve performance and end-user experience.
Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
Involved in creating Hive tables, loading wif data and writing Hive Adhoc queries dat will run internally in MapReduce and TEZ.
Knowledge of job workflow management and coordinating tools like Oozie.
Experience working wif NoSQL database technologies, including MongoDB, Cassandra and HBase.
Strong understanding of Logical and Physical data base models and entity-relationship modeling.
Worked in Agile methodology and used JIRA for Development and tracking the project.
A good development experience wif Agile Methodology, SDLC and Water fall methodology.
Involved in Agile methodologies, daily scrum meetings, spring planning.
Hands on Experience in VPN, Putty, winSCP etc. Responsible for creating Hive tables based on business requirements.
Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR, Azure

IDE’s: Eclipse, IntelliJ, Spyder, Jupyter

OperatingSystems: Windows, Linux

Programming languages: Python, Scala, Linux shell scripts,PL/SQL, C, C++, Java

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server,DB2 Netezza, HBASE, Hive, MongoDB, Postgres, Azure SQL Datawarehouse

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX

Business Tools: Tableau, Crystal Reports, Dashboard Design, WebI Rich Client

PROFESSIONAL EXPERIENCE

Confidential - Jacksonville, FL

Sr.Data Engineer

Responsibilities:

Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Create and maintain optimal data pipeline architecture in cloud Microsoft Azure.
Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL Azure Data Lake Analytics.
Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive table.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed ADF Pipelines to load data from on prem to AZURE cloud Storage and databases.
Exposed transformed data in Azure Spark Databricks platform to parquet formats for efficient data storage.
Built the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and 'big data' technologies like Hadoop Hive, Azure Data Lake storage.
Migrated projects from Cloudera Hadoop Hive storage to Azure Data Lake Store to satisfy Confidential Digital transformation strategy.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
Building distributed, reliable and scalable data pipelines to ingest and process data in real-time using Spark-Streaming and Kafka.
Developed Kafka consumer API in Scala for consuming data from Kafka topics and loaded into DB2 database.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Developed Sqoop jobs to import data in Avro file format from Oracle database to HDFS and created hive tables on top of it.
Developed shell scripts to automate the control M jobs depends on frequency like daily, weekly, monthly and annual.
Deployed the applications on production servers using Jenkins as continuous deployment tool.
Provided the on-call support for the application after the postproduction to solve the data issues and troubleshoot the hadoop ecosystem run time issues.
Involved in performance tuning of Hive from design, storage and query perspectives.
Collected the Json data from HTTP Source and developed Spark APIs dat halps to do inserts and updates in Hive tables.
Preparing the detailed software application architecture and technical design documents as part of SDLC process.
Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
Responsible for defining, developing, and communicating key metrics and business trends to partner and management teams.
Containerize data wrangling jobs in Docker containers utilizing Git and Azure DevOps for version control.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, S3, Kafka, Spark, Scala, Java, Azure, Jenkins, EMR, GitHub, Azure Data Factory, Azure Data Lake storage.

Confidential - Minneapolis, MN

Data Engineer

Responsibilities:

Developed Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
Created the Hive tables based on partitioning and scheduling the data ingestion jobs.
Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
Used AWS Glue for the data transformation, validate and data cleansing.
Transformed the data received from source systems using scalaand python as per business transformations and saved the data into the files to load into hive.
Developed shell scripts to automate the daily jobs and weekly jobs.
Developed Spark scripts to import large files from Amazon S3 buckets.
Used Map Reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
Used JSON schema to define table and column mapping from S3 data to Redshift.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
The Hive tables are created as per requirement were Internal or External tables defined wif appropriate static, dynamic partitions and bucketing, intended for efficiency.
Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.
Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
Performed querying of both managed and external tables created by Hive using Impala.

Environment: Hadoop 2, HDFS, Spark 2.2, Scala, AWS, S3, Redshift, python, Ec2, Java, Kafka, Hive, HiveQL, Oozie, Sqoop, Impala, Git, HBase.

Confidential - St. Louis, MO

Big Data Hadoop Developer

Responsibilities:

Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
Developed Spark API to import data into HDFS from Teradata and create Hive tables.
Load and transform large sets of structured, semi structured and unstructured data.
Supported Map-Reduce Programs those are running on the cluster.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
Involved in creating Hive tables, loading wif data and writing hive queries, which will run internally in map, reduce way.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Experience in managing and reviewing Hadoop log files.
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Imported data from Teradata to HDFS thru Informatica maps using Unix scripts.

Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper.

Confidential

SQL Developer

Responsibilities:

Optimized SQL queries for improved performance and availability.
Created backup and restore operations.
Optimized long running stored procedures and queries for effective data retrieval.
Performed analysis and presented results using SQL, SSIS, MS Access, Excel, and Visual Basic scripts.
Used SSIS dataflow such as using Derived column, Data Conversion, Merge joins, Union All, Conditional split, lookup and row count.
Created or modified the ETL processes to achieve the daily/monthly loading requirements.
Involved in performance tuning and optimizing for SQL server application by using performance monitor.
Created views and business logics to reduce the database complexities for easily reporting purposes.
Wrote and automated tools and scripts to increase departmental efficiency and automate repeatable tasks.
Supports and assist QA Engineers in understanding,testing and troubleshooting.
Written build scripts using ant and participated in the deployment of one or more production system.
Responsible to manage data coming from different sources.

Environment: SQL, SSIS, ETL, DDL, MS SQL server 2012/R2/2008, Toad for DB2.

We provide IT Staff Augmentation Services!

Sr.data Engineer Resume

Jacksonville, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship