Aws/hadoop Developer Resume
Charlotte, NC
SUMMARY:
- 7+ years of extensive IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
- 3+ years of strong experience, working on Apache Hadoop ecosystem components like MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Oozie, Zookeeper, Flume, Spark, Python with CDH4&5 distributions and EC2 cloud computing with AWS.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
- Strong in Developing MapReduce applications, Configuring the Development Environment, Tuning Jobs and Creating MapReduce Workflows.
- Experience in performing data enrichment, cleansing, analytics, aggregations using Hive and Pig.
- Knowledge in Cloudera Hadoop distributions and few other majorly used distributions like Horton works and MapR.
- Hands on experience in working with Cloudera CDH3 and CDH4 platforms.
- Proficient in big data ingestion and streaming tools like Flume, Sqoop, Kafka, and Storm.
- Experience with different data formats like Json, Avro, parquet, RC and ORC and compressions like snappy & bzip.
- Experienced in analysing data using HQL, PigLatin and extending HIVE and PIG core functionality by using custom UDFs.
- Good Knowledge/Understanding of NoSQL data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
- Good knowledge on various scripting languages like Linux/Unixshell scripting and Python.
- Good knowledge of Dataware housing concepts and ETL processes.
- Involved in importing Streaming data using FLUME to HDFS and analysing using PIG and HIVE.
- Experience in importing streaming data into HDFS using Flume sources, and Flume sinks and transforming the data using Flume interceptors.
- Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Used Oozie and Control - M workflow engine for managing and scheduling Hadoop Jobs.
- Diverse experience in working with variety of Database like Teradata, Oracle, MySql, IBM DB2 and Netezza.
- AWS provides a secure global infrastructure, plus a range of features that use to secure the data in the cloud
- Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, WorkSpaces, Lambda, Kinesis, RDS, SNS, SQS)
- Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement.
- Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Good knowledge in understanding Core Java and J2EE technologies such as Hibernate, JDBC, EJB, Servlets, JSP, JavaScript, Struts and spring.
- Experienced in using IDEs and Tools like Eclipse, Net Beans, GitHub, Jenkins, Maven and IntelliJ.
- Implemented POC to migrate map reduce programs into Spark transformations using Spark and Scala.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills.
TECHNICAL SKILLS:
Big Data: Hadoop, Big Data, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, Oozie, Impala, Kafka, Spark, Nifi
Databases: SQL Server, MySQL, Oracle, Hbase, Netezza
Languages: SQL, PL/SQL, HTML, Java,J2EE, JSP, Servlets, Hibernate, JDBC JSP, UNIX Shell Scripting, Python
Tools: Eclipse, NetBeans, IntelliJ, Maven, Anthill, SQLExplorer, TOAD
Version Control: GitHub, SVN
Operating Systems: Windows Server 2008/2012, UNIX, LINUX, Microsoft SQL Server
AWS Services: EC-2, ELB, RDS, S3, CloudWatch, SNS, SQS, EBS.
Packages: MS Office Suite, MS Visio, MS Project Professional
DevOps: Chef, puppet, jenkin
Other Tools: Putty, WINSCP, EDI(Gentran), Stream weaver, Amazon AWS
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
AWS/Hadoop Developer
Responsibilities:
- Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.
- Analysed data which need to be loaded into hadoop and contacted with respective source teams to get the table information and connection details.
- Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and Netezza and loaded into HDFS
- Created Hive tables and partitioned data for better performance. Implemented Hive UDF's and did performance tuning for better results.
- Developed Map-Reduce programs to clean and aggregate the data.
- Implemented optimized map joins to get data from different sources to perform cleaning operations before applying algorithms.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Implemented POC to introduce Spark Transformations.
- Continuously worked with architects to design spark model for existing Mapreduce model.
- Initially migrated existing MapReduce programs to spark model using python.
- Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- On demand secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.
- Extensive knowledge of working on NiFi.
- Migrated an existing on-premises application to AWS.
- Used AWS services like EC2 and S3 for small data sets.
- Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
- Continuous coordination with QA team, production support team and deployment team.
- Implemented test scripts to support test driven development and continuous integration.
- Created analysis documents to understand table types (Truncate and load or incremental load), frequency of updates, source data base connection details etc.
- Worked on documenting all tables created to ensure all transactions are drafted properly.
- Analysed data by performing Hive queries and running Pig Scripts to study transactional behaviour of policies and plans.
- Developed shell scripts to move files (received through SFTP) from landing zone server to HDFS, update the file tracker and send mails after the execution is complete.
- Participated in design and implementation discussion for developing Cloudera 5 Hadoop eco system and supported team when there are updates in Cloudera versions.
- Worked in Agile development environment having KANBAN methodology. Participated in daily scrum and other design related meetings.
Environment: Hadoop, CDH, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Spark, Oozie, Linux, Python, DB2, Oracle, AWS.
Confidential, Greenville, SC
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Developed Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in loading data from UNIX file system to HDFS.
- Worked on analysing data with Hive and Pig and real time analytical operations using Hbase.
- Create views over HBase table and used SQL queries to retrieve alerts and meta data.
- Worked with HBASE NOSQL database.
- Helped and directed testing team to get up to speed on Hadoop Data testing.
- Worked on loading and transforming large sets of structured, semi structured and unstructured data.
- Implemented Mapreduce secondary sorting to get better performance for sorting results in Map Reduce programs.
- Designed data visualization to present current impact and growth of the department using python package Matplotlib.
- Involved in data analysis using python and handling the ad-hoc requests as per requirement.
- Developing python scripts for automating tasks.
- Worked on User Defined Functions in Hive to load data from HDFS to run aggregation function on multiple rows.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Coordinated with testing team for bug fixes and created documentation for recorded data, agent usage and release cycle notes.
- Created different UDF’s and UDAF’s to analyse partitioned, bucketed data and compute various metrics for reporting on dashboard and stored them in different summary tables.
- Used Oozie Workflow engine to run multiple Hive and Pig jobs.
- Created stored procedures, triggers and functions to operate on report data in MySQL.
- Wrote backend code in Java to interact with the database using JDBC.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Python, Oozie, Linux, UNIX.Client:
Confidential, Columbus, OH
Hadoop Developer
Responsibilities:
- Involved in analysing requirements and establish development capabilities to support future opportunities.
- Involved in sharing data to teams which analyse and prepare reports on Risk management.
- Handled importing of data from various data sources, performed transformations using PIG, MapReduce, loaded data into HDFS and extracted data from MySQL into HDFS using SQOOP.
- Worked on streaming the analyzed data to the existing relational databases using SQOOP by making it available for visualization and report generation to the BI team.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Analysed Web server log data using Apache Flume.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase, SQL, Oozie, Linux, UNIX.
Confidential
SQL Developer
Responsibilities:
- Involved in development of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
- Create new tables, views, indexes and user defined functions.
- Perform daily database backup & restoration and monitor the performance of Database Server.
- Actively designed database to fasten certain daily jobs and stored procedures.
- Optimized query performance by creating indexes.
- Developed Stored Procedures, Views to be used to supply data for all reports. Complex formulas were used to show derived fields and to format data based on specific conditions.
- Involved in Administration of SQL Server by creating users & login ids with appropriate roles & grant privileges to users and roles. Worked on authentication modules to provide controlled access to users on various modules
- Create joins and sub-queries for complex queries involving multiple tables.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update tables to implement business logic.
- Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements.
- Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
- Designed and Implemented tables and indexes using SQL Server.
Environment: Eclipse, Java/J2EE, Oracle, HTML, PL/SQL, Oracle, XML, SQL.
Confidential
Programmer Analyst/ SQL Developer
Responsibilities:
- Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
- Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
- Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
- Responsible for designing advance SQL queries, procedure, cursor, triggers.
- Build data connection to database using MS SQL Server.
- Worked on project to extract data from XML file to SQL table and generate data file reporting using SQL Server 2008.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for Unit Testing.
Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.
Confidential
Java Developer
Responsibilities:
- Responsible for understanding the scope of project and requirement gathering.
- Developed the web tier using JSP, Struts MVC to show account details and summary.
- Created and maintained the configuration of the Spring Application Framework.
- Implemented various design patterns - Singleton, Business Delegate, Value Object and Spring DAO.
- Used Spring JDBC to write some DAO classes which interact with database to access account information.
- Mapped business objects to database using Hibernate.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for Unit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Developed application using Eclipse and used build and deploy tool as Maven
- Used Log4J to print logging, debugging, warning, info on server console.
Environment: Java, J2EE, JSON, LINUX, XML, XSL, CSS, Java Script, Eclipse.