Senior Data Engineer Resume
Malvern, PA
SUMMARY:
- Having 5+years experience in software design and development using HadoopBigDataEcho Systems
- 4years of IT experience in Analyzing, Designing, Developing, Implementing and Testing of Software Applications in Big Data Analytics and development using HDFS, HIVE, Sqoop, Spark Core, Spark Streaming, Spark SQL, KAFKA, Nifi, and ZooKeeper
- 7/7 years of IT experience in software design and development using core Java, Spring Core, JDBC, XML in UNIX operating system.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Excellent knowledge on SparkEcosystem (SparkExecutors, SparkCores, Spark Memory, and SparkJobs)
- Involved in Data Ingestion to HDFS from various data sources.
- Strong experience on Hadoop distributions like HortonWorksand Clouderadata platforms.
- Experience in designing and developing applications in Spark using Java API
- Good understanding of SQL and NoSQL databases and hands on work experience in writing applications.Experienced in developing Nifi data flow processors that work with different file formats like Text, JSON, Parquet and Avro.
- Extensive Experience on importing and exporting data using stream processing platforms like Kafka.
- Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
- Executed HIVECommands for reading, writing, and managing large datasets residing in distributed storage.
- Analyzed large data sets by running Hive queries
- Importing and exporting the data from relational databases, NO SQL and MySQL DB’ S using Sqoop.
- Ability to analyze different file formats Avro and Parquet.
- Has good experience on REST web services.
- Extensive exposure to all aspects of Software Development Life Cycle (SDLC) i.e. Requirements Definition for customization, Prototyping, Coding, Testing and Deliveries.
- Experienced in using agile approaches, including Waterfall, Test - Driven Development and Agile Scrum
- In-depth knowledge of fundamental and advanced concepts for working with Amazon EBS, ELB,Cloud front, AWS IAM, Amazon VPC.
- Expert knowledge on creating SNS topics and alerts when service fails.
- Expert knowledge of AWS Lambda and created multiple lambdas to move data and to create dashboards and push it on slack and Ema.
TECHNICAL SKILLS:
Programming Language: Python 2.x, Scala, HQL, Latin, Presto, SQL, Core JAVA (Collections, Multi-Threading, Concurrency API, StringsMemory management, Serialization, Thread Executors, Design Principles and Design Patterns)AWS Glue (ETL), AWS EMR, AWS Data Pipeline, AWS SNS, AWS RDS, AWS LAMBDA, AWS EC2, AWS VPC, AWS IAM, Google Big query
Databases: Aurora Mysql-5.6, 5.7, MySQL -5.5, 5.6, 5.7, MS SQL, Oracle.
NoSQL Databases: MangoDB, DynamoDB
BIG DATA /HADOOP ECOSystem: HDFS, Map Reduce, Pig, Hive, HBase, Sqoop, Zoo Keeper, Spark 1.6.3, Hadoop2.6.
Scripting Languages: Shell Scripting, Bash Scripting
Data Formats: XML, JSON, AVRO, Parquet
Development Tools: Eclipse, Notepad++, PyCharmCode
Repository Tools: GitHub
Cloud Technologies: REST Services, WebLogic and Tomcat
ETL / OLAP: Pyspark, SQL Server Reporting Services (SSRS), SSIS, Tableau, Windows NT/XP/7/8, Windows server 2008, Linux/Unix.
Methodologies: Agile Scrum (Scrum, Grooming, Sprint Planning, Daily, Review, Demo), Waterfall
Domains: Telecommunications, Banking, and Finance.
PROFESSIONAL EXPERIENCE:
Confidential, Malvern, PA
Senior Data Engineer
Environment: spark-1.6.3, hadoop2.6, AWS S3, MySQL 5.7, Sql 2008, PyCharm, Python-3.6, Python 2.7, AWS EMR, AWS Glue, AWS IAM, AWS Data Pipeline, AWS SNS, AWS Lambda, Spark, Kafka, zookeeper, Hive, oozie, Java 8, REST services, CDH Apache Hadoop.JDK1.6, Windows, Eclipse.
Responsibilities:
- Migrated and materialized 50 TB SQL legacy databases to AWS Cloud.
- Built Athena Database by cataloging Metadata in AWS Glue catalog as well as Hive catalog. For faster queries 80% faster than Traditional databases..
- Spun up multiple EMR clusters to join and analyze large tables and manually installed Presto Engine and Jupiter notebook
- Created and worked on Developer Endpoints and Notebook servers in AWS Glue.
- Support Activities as part of DEVOPS team. Wrote Presto, Hive, SQL and MySQL queries to extract data what business is looking for and wrote a lambda in Python which publish such results on Slack and Email
- Configured and created VPC, Subnets and Security groups. And spun multiple EC2 instances.
- Created multiple IAM roles and created policies to create access to other services in AWS for in use service.
- Created multiple Data Pipelines to copy dynamo db tables in different regions when streams not enabled.
- Experience in designing data warehouse.
- Used sqoop to copy data from RDBMS tables into HDFS or Hive.
- Assigned suitable number of mappers and also increased or decreased based on the table size. converted Avro file into JSON and vice versa. worked on getting the data from relational databases then process data using hive or spark and then export data using sqoop to target databases. populated and created staging tables and successfully moved data from source to stage table and eventually moved to final table or target database for further use of data.
- Perform standard extract, transform, load (ETL) processes on data using Apache spark. worked on Read and write files in a variety of file formats and converted to rdd, dataframes, and performed spark sql on it.
- Ingested web server log data to HDFS using Flume, and Kafka. Also, processed this data using spark streaming, and flink expert in Kafka components and APIs
Confidential, Jersey City, NJ
Big Data Engineer
Environment: CDH Apache Hadoop2.6, zookeeper, Hive, oozie, Nifi, AWS EMR, AWS Glue, AWS IAM, AWS Data Pipeline, AWS SNS, AWS Lambda, Kafka, Spark 1.6.3, Hadoop2.6, Spyder, PyCharm, Python-3.6, Python 2.7, Java 8, REST services, JDK1.6, Windows, Eclipse, Clear Case, Ext JS, Restful Web Service.
Responsibilities:
- Worked on Apache Spark, Hive
- Extended Hive core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregate Functions (UDAF) for Hive.
- Maintained, Developed and Normalized AW Aurora MySQL Databases in different regions.
- Migrated data between multiple regions in between Aurora MySQL Database and from on premises sql legacy Database
- Created Dashboard to analyze streaming data through Kafka on Apache spark Using spark streaming libraries.
- Created MySQL Aurora RDS instance and migrated the data to it.
- Organized data and data flows using Data governance tool Talend on Created Data lake.
- Maintained, Developed, and Designed access patterns for DynamoDb for Lambdas to access DynamoDb.
- Wrote multiple modules in Terraform 1) Deployed DynamoDb tables 2) S3 backend for multiple Environments like Development, Stage and Production for saving state files in S3
- Used AWS Glue extensively for moving data between MySQL databases and from DynamoDb to MySQL, from MySQL to DynamoDb, DynamoDb to S3 and vice-versa, MySQL to S3 and vice-versa with the required transformations.
- Wrote Python scripts to recreate the HTML page which has razor logic and save it in s3 and its link in MySQL db for quick access
- Wrote multiple Lambdas to push data to s3 from MySQL database and scheduled triggers.
- Used AWS Data migration services extensively to do on going replication for data backup in between two production MySQL databases.
- Troubleshooted Database connectivity issues from app servers using Postman and Soap UI, to check for response code by creating required Payload, Header and URL.
- Used GITHUB to push, pull and clone scripts and created multiple branch depending on environment.
- Troubleshooted all the above created and quickly resolved issues.
- Involved scheduled and participated in scrum meetings and sprint planning and other team Development activities
- Environment: AWS S3, MySQl 5.7, Sql 2008, Spyder, PyCharm, Python-3.6, Python 2.7, AWS
- EMR, AWS Glue, AWS IAM, AWS Data Pipeline, AWS SNS, AWS Lambda.
- Actively participated in scrum me
Confidential, Princeton, NJ
Data Warehouse Engineer
Environment: Spark, Spyder, PyCharm, Python-3.6, Python 2.7
Responsibilities:
- Install, configure and deploy software, provide quality assurance.
- Troubleshoot various software issues using debugging process and coding techniques.
- Collaborate in planning initiatives in Application Development and best practices.
- Worked with business teams and technical analysts to understand business requirements and Determine how to leverage Hadoop technologies to create solutions that satisfy the business requirements.
- Optimize Hadoop environment using Map Reduce, Spark and HDFS footprints Hadoop security, data management and Governance
- Extending HIVE and PIG core functionality by using custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregate Functions (UDAF) for Hive and Pig
- Implemented reusable and high throughput components using Java, SQL, spring frameworks.
- Hadoop development and implementation to handle huge structured and unstructured data.
- Transformed disparate data sets into HDFS file system for further analysis.
- Pre-processed using Hive and Pig and produce analytics and predictions.
- Worked on Apache Spark, Hive, Pig, MapReduce and HBase.
- Developed, tested and deployed in stable cluster using Hortons work distribution.
- Used a hybridized method for development AGILE and Waterfall.
Confidential
Database Administrator
Environment: Linux, Windows 98/NT/2000, Oracle 11g/12c, SQL, Import/Export, Database tools/Utilities.
Responsibilities:
- Have worked and supported both Development and Production Database environments.
- Install and configure DBMS software, upgrades and related products, Recommend and assess new versions and products.
- Integrate configurations and applications.
- Manage multiple concurrent RDBMS instances, both production and development or test on various platforms.
- Managed Database security.
- Implemented the Backup and recovery strategy for the project.
- Reverse and forward engineer databases.
- Migration of DDL and DML from one database to another.
- Use of Explain and other tools to monitor and trace SQL usage.
- Troubleshoot database issues like connectivity and slowdowns.
- Creation of customized database scripts for administrative purposes.
- Create and maintain data dictionary with emphasis on business rules.
- Maintain/Restore database using RMAN.
- Find existing database areas for improvement (remapping of tablespaces, adding indexes, etc.)
- Create Crystal Reports based on business areas needed for analytical study.
- Extensively used SQL Plus for Querying/Reporting.
- Monitor and troubleshoot database backups, disk space and server availability.
- Identifying root cause of Oracle errors and providing solutions to resolve errors.
- Creating database reorganization procedures, scripting database alerts and monitoring scripts.
- Generated internal DBA team documents including problem resolutions documents, lessons learned documents, how to documents.
- Controlling and maintaining system security through Profiles, Roles, Privileges and
- Auditing.
- Configured and managed database replication to provide high availability and various levels of failover capabilities.
- Participate in the development and testing of disaster plans and security processes.