Big Data Technical Architect Resume
SUMMARY:
- Over 15 years of professional IT experience with emphasis on Big data Technologies, working with many large scale applications in various domains including Finance, Banking, Insurance and Health Care
- Cloudera Certified Hadoop Developer for Apache Hadoop (CCDH 410 - Version: 5)
- Experience in complete Software Development Life Cycle process of application development. (Requirements gathering, analysis, design, development, testing and implementation).
- Expertise in Hadoop Distributed File System (HDFS), Map Reduce, PIG, HIVE, HBASE, SQOOP .
- Extensive experience working on working in Big Data Hadoop Ecosystem comprising Apache Spark 2.3, PySpark API, Docker, Map Reduce, Hive, Pig, Apache Oozie, Sqoop, Flume, HDFS, Apache Avro .
- Expertise in working on AWS using Lambda, EMR, Redshift, SNS, SES, Glue, Data Pipeline, S3, API Gateway, Athena API, Amazon Kinesis and DynamoDB No SQL DB .
- Excellent understanding of Hadoop architecture and Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using Apache Sqoop.
- Good Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Extensive experience in creating Complete Workflow chain from the scratch for multiple projects within client domain using Apache Oozie . Workflow Scheduling involves Map Reduce Jobs, Hive, PySpark and Shell Script, Email actions with output of one workflow fed as input to another.
- Good experience in Cloudera platform and Cloudera Manager.
- Migration of the revenue data from Oracle to Hadoop, Hive, and Amazon Redshift
- Very strong industry experience in Apache Hive for data transformation.
- Strong experience on both Development and Maintenance/Support projects.
- Good team player with excellent communication skills to work in a team and individual environment.
- Strong exposure to IT consulting, software project management, team leadership, design, development, implementation, maintenance/support and Integration of Enterprise Software Applications.
- Extensive experience in conducting feasibility studies, Plan reviews, Implementation, and Post Implementation Surveys
- Demonstrated ability to work independently, showing a high degree of self-motivation and initiative.
- Excellent team member with problem-solving and trouble-shooting capabilities, quick learner, result oriented and an enthusiastic team player.
- Extensive Experience in designing and developing in Spark using Python
- Excellent in Analytical /problem solving skills
TECHNICAL SKILLS:
Big Data Technologies: Apache Spark 2.3, Python API for Spark Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Impala, Apache Avro
Programming languages: Python, Java, Cobol
Databases: Oracle, DB2,HBase, MySQL, Redshift
AWS: Lambda, EMR, Redshift, CFT, ECS, SNS, SES, Glue, Data Pipeline, S3, API Gateway, Athena API, Amazon Kinesis and DynamoDB No SQL DB.
NoSQL Database: DynamoDB
Operating Systems: Windows & Linux, UNIX
Control: M and Oozie
Other Tools/Utilities: TSO/ISPF, QMF, SPUFI, SDF II and Changeman, CVS, SVN, GIT
Defect Tracking Tools: HP Quality Center
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Technical Architect
Responsibilities:
- Designed and developed end to end applications for Data ingestion, Organized Data layers and business use cases.
- Developed DynamoDB components to store the Insights Data
- Developed AWS Glue ETL jobs using PySpark to securely transform datasets in S3 curated storage into consumption data views
- Worked on Continuous Integration, Continuous Deployment, Build Automation and Test Driven Development to enable the rapid delivery of end user capabilities using Amazon Web Services (AWS) Stack (Code Commit, CodeDeploy, Codepipeline, CodeBuild, IAM, CFT)
- Designed and developed the insights applications using AWS using Lambda, SNS, Glue API, S3, API Gateway, Athena API.
- Developed python Spark AWS glue ETL Job to process insights raw & aggregates data in Parquet and push the output to S3 & DynomoDB.
- Created System architecture/ Design and Software development for insights application.
- Worked on AWS Cloud Formation to provision AWS resources(S3, SNS, RDS, EMR, Glue, Lambda, DynamoDB)
- Developed Spark Code to implement Data Quality Checks - to check the processed data across the system, flight count etc.
- Leveraged Amazon Athena for ad-hoc query analytics
- Analyze the Business Requirements and come up with Design/ Architecture identifying the different components, flow diagrams and discuss with the team.
- Participate in the end-to-end life cycle of the project right from requirements, design, development and testing.
Confidential
Big Data Technical Lead
Environment: AWS EMR, AWS S3, AWS Cloud Watch, RDS(MYSQL), HDFS, Hive, Redshift, Sqoop, Oozie Workflows, Shell Scripts, Spark
Responsibilities:
- Analyzing the requirements and the existing environment to help come up with the right strategy to build the BIC system.
- Developed Spark, Hive scripts for Data processing.
- Developed Oozie Workflow and Coordinator for integrating other systems like Denodo, Hadoop ETL (Hive, Sqoop), Redshift & cloudwatch
- Enabled the Oozie SLA feature to alert the long running job.
- Built and Owned Data ingestion process from different sources to Hadoop cluster
- Developed programs Python Spark job to process raw data in Parquet and push the output to S3.
- Worked on ETL scripts to pull the data from denodo Data Base into HDFS.
- Developed hive tables to upload data from different sources.
- Involved for Database Schema design.
- Developed script to load the data in to Redshift from Hive tables.
- Created different views in Redshift for different applications.
- Stored the job status in MYSQL RDS
- Proposed an automated system using Shell script to sqoop the job.
- Worked in Agile development approach.
- Created the estimates and defined the sprint stages.
- Mainly worked on Hive queries to categorize data of different claims.
- Created cloud watch to monitor the application
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Involved in the design of Distribution styles for redshift tables.
Confidential
Big Data Technical Lead
Environment: CDH 5, HDFS, Hive, Impala, Java, Sqoop, Tableau, Oozie Workflows, Shell Scripts, IntelliJ, Gradle, Core
Responsibilities:
- Analyzing the requirements and the existing environment to help come up with the right strategy to build the DDSW system.
- Designed and Executed Oozie workflows using Hive, Python and Shell actions to extract, transform and Load data into Hive Tables.
- Worked extensively with Avro and Parquet file formats.
- Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
- Worked on ETL scripts to pull the data from Oracle Data Base into HDFS.
- Developed hive tables to upload data from different sources.
- Involved for Database Schema design.
- Involved Sprint Planning and Sprint Retrospective meetings
- Daily Scrum Status meeting.
- Proposed an automated system using Shell script to sqoop the job.
- Worked in Agile development approach.
- Created the estimates and defined the sprint stages.
- Developed a strategy for Full load and incremental load using Sqoop.
- Mainly worked on Hive/Impala queries to categorize data of different claims.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive table’s using Hive ODBC connector.
- Written python scripts to generate alerts
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Implemented POC on AWS
- Worked on Kerberos Authentication for Hadoop.
Confidential
Big Data Technical Lead
Environment: Hadoop, HDFS, MapReduce, HBase, Hive, Flume, Oozie, DB2 and Cloudera Hadoop Distribution (CDH 4).
Responsibilities:
- Lead Developer for migrating application Archival data to Big Data
- Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
- Responsible for managing data from multiple sources.
- Involved in managing and reviewing Hadoop log files.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Simple to complex MapReduce Jobs using Hive.
- Created Hbase tables to store various data formats of data.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance of hive queries.
- Responsible for building scalable distributed data solutions.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Extracted files from DB2 through Sqoop and placed in HDFS and processed.
- Supported MapReduce Programs those are running on the cluster.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Documented the systems processes and procedures for future references.
Confidential
Technical Lead
Responsibilities:
- Involved in different phases to gather requirements, document the functional specifications, design, data modeling and development of the applications.
- Developed POC for different use cases to implement big data solutions.
- Load and transform large sets of structured and semi structured
- Configured workflows to run on the top of Hadoop using spring batch shell script and these workflows comprises of heterogeneous jobs like hive and Map Reduce to ingest the offloaded cold data from enterprise data warehouse into HDFS for archival
- Created JCL’s (Using ICETOOL, SORT) to copy VSAM Files to Flat files and convert the data types to readable format (text files) and FTP them to Hadoop cluster (HDFS)
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
- Unload reference data from DB2 to HDFS using SQOOP.
- Worked on Hive for exposing data for further analysis.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Involved in writing Hive scripts to extract, transform and load the data into Database.
- Developed workflow in shell script to automate the tasks of running Map Reduce jobs and Hive Scripts on the imported data from Mainframes
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Built Hadoop cluster ensuring high availability for Name Node, mixed-workload management, performance optimization, health monitoring, backup and recovery across one or more nodes
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, HBase, Zookeeper and Sqoop.
- Scheduled and managing cron jobs, control-m jobs and write shell scripts to generate alerts
- Monitor and manage daily jobs through RabbitMQ and Apache dashboard application
- Used Sqoop to efficiently transfer data between database and HDFS and use Flume to stream the log data from servers.
- Implemented Name Node backup using NFS. This was done for High availability.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Setup alerts with Cloudera Manager about memory and disk usage on the cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the requirement.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Installed Oozie workflow engine to run multiple Hive jobs.
Confidential
Sr. Software Engineer
Environment: Java, JSP, Oracle, JDBC Template, Tomcat, JavaScript, XML, DHTML, CSS, HTML, JQuery
Responsibilities:
- Involved in Analysis, Design, Development and Testing of application modules.
- Analyzed the complex relationship of system and improve performances of various screens.
- Developed JSP pages, using Java Script, Jquery, AJAX for client side validation and CSS for data formatting.
- Developed various reports using Adobe APIs and Web services.
- Preparation of Technical Design for the enhancement and maintenance requests.
- Review of Code, Unit Test Plan and Unit test results.
- Team tracking and Issue management.
- Wrote test cases using Junit and coordinated with testing team for integration tests
- Fixed bugs, improved performance using root cause analysis in production support
Confidential
Sr. Software Engineer
Environment: Java, J2EE (JSPs & Servlets), JUnit, HTML, CSS, JavaScript, Apache Tomcat, Oracle
Responsibilities:
- Involved in different phases to gather requirements, document the functional specifications, design, data modeling and development of the applications.
- J2EE server side development to support business logic, integration, and persistence.
- Used JSP with Spring Framework for developing User Interfaces.
- Integrated Security Web Services for authentication of users.
- Responsible for Testing and moving the application into Staging and Production environments.
- Responsible for Project Documentation, Status Reporting and Presentation.
- Used CVS version control to maintain the Source Code.
Confidential
Sr. Software Engineer
Environment: Java, J2EE (JSPs & Servlets), JUnit, HTML, CSS, JavaScript, Apache Tomcat, Oracle
Responsibilities:
- Involved in requirements analysis and prepared Requirements Specifications document.
- Designed implementation logic for core functionalities
- Developed service layer logic for core modules using JSPs and Servlets and involved in integration with presentation layer
- Involved in implementation of presentation layer logic using HTML, CSS, JavaScript and XHTML
- Design of Oracle database to store customer's & account’s details
- Used JDBC connections to store and retrieve data from the database.
- Development of complex SQL queries and stored procedures to process and store the data
- Developed test cases using JUnit
- Involved in unit testing and bug fixing.
- Used CVS version control to maintain the Source Code.
- Prepared design documents for code developed and defect tracker maintenance.
Confidential
Sr. Programmer Analyst
Environment: Core Java, Java Batch, Service Beans, EJB, RMI/IIOP, J2EE, COBOL390, CICS, DB2
Responsibilities:
- Responsible for Proof of Concept, Planning, Designing new proposed Architecture.
- Worked on Java, Swing, Web services, XML in addition to Mainframe Technology.
- Extracted the business rules from Legacy COBOL programs to code in Java.
- Used latest methodologies to convert the existing Mainframe programs to Java &Java batch.
- Able to migrate with the limited resources available in Mainframe.
- Fine tuning of application programs with the help of DBA.
- Utilized transaction wrapper technology (EJB, Batch, ServiceBean on Webpshere cluster).
- Attended the functional meetings and prepared the high level Detail Design Document.
- Designed high and low level Design documents for the new functions to be implemented.
- Supported the re-structuring of DB2 tables by re-writing the existing programs
- Debugging and troubleshooting any technical issues while implementing the applications
- Implemented Java client based OLTP process with Websphere server running on Mainframe z/OS Host.
Confidential
Programmer Analyst
Responsibilities:
- Procuring the project requirements from business analyst & users, breaking up the project delivery into phases and meeting the deadlines as per the estimates.
- Transforming the Business requirements into design.
- Preparation Analysis, estimation and design.
- Single point of contact between customer and offshore team members.
- Prepared high-level and low-level design based on business requirement document.
- Preparation of Technical Specifications by using high-level design and business requirement document.
- Providing Module Inventory and Estimates by identifying the impacted components.
- Business and Technical knowledge sharing with other Team members.
- Coded complex programs, report program (batch & Online) in COBOL/VSAM/DB2/CICS
- Preparation of analysis documents, modification of Programs / JCLs and peer review
- Preparing the Unit Test Case document, Coding and Unit Test Results document
- Development of the maps, online and batch programs and perform Review of Test cases and code.
- Solving defects at SIT/UAT phases and giving the Implementation support.
- After implementation preparing Defect log and Defect Action Plan documents.
- Mentoring and motivating team members in enabling the team to work independently on Tasks.