Hadoop Developer Resume
Charlotte, NC
SUMMARY
- Over 9+ years of IT experience in Design, Development, Deployment, Maintenance and Support of in all phases of Hadoop Eco system components and Big Data technologies.
- 6+ years of extensive working experience on Hadoop eco - system components like HDFS, Spark, Hive, Sqoop, Yarn, Zookeeper, Pig, Flume, Map Reduce, Kafka and Oozie.
- Experienced in implementing Big Data projects using Hortonworks and Cloudera Distribution.
- Experience in cloud platforms like AWS and involved in POC Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
- Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, Glue, Athena, RedShift, Dynamo DB which provides fast and efficient processing of Big Data.
- Imported the data from different sources like AWS S3, Local file system into Spark RDD.
- Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation.
- Experienced as Hadoop, expertise in providing end to end solutions for real time big data problems by implementing distributed processing concepts such as map reduce on Hadoop frameworks such as HDFS and Hadoop Ecosystem components.
- Experience in working on large scale big data implementations and in production environment.
- Experience in loading data into spark schema RDD's and querying them using Spark-SQL.
- Experience in creating RDD, Data frames sand Datasets for the required data and did transformations using Spark RDD's and Spark SQL.
- Developed analytical components using Scala, Spark, and Spark Stream.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers using Spark API for near-real time and real time data.
- Experience in Data Migration, importing and exporting data using Sqoop from HDFS to Relational Database systems (RDBMS) and vice-versa.
- Experienced in using Pig scripts to do transformations, event joins, filters, and some pre-aggregations before storing the data onto HDFS.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing Partitioning and Bucketing, writing, and optimizing the HiveQL queries.
- Experienced in handling different file formats like Text file, Parquet, ORC, Avro data files, Sequence files, Xml and Json files.
- Excellent understanding and knowledge of NOSQL databases like HBase, and Cassandra.
- Experience in installation, configuring, supporting, and managing Hadoop Clusters using Apache Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experienced in working Agile & Waterfall methodologies.
TECHNICAL SKILLS
Big Data: Spark, Kafka, YARN, Map Reduce, Nifi, Kudu, Hive, Impala, Sqoop, Flume, Oozie,Airflow.
Cloud Technologies: AWS Lambda, EC2, EMR, S3, Glue, Step Functions, Cloud Formation SNS, Cloudwatch, CloudFormation, IAM, Athena, Redshift, Kinesis, Presto.
Languages: PL/SQL, Pig Latin, Java, HiveQL, Scala, Python
Databases: Oracle, MySQL, MS-SQL, HBase, DB2, Netezza, Teradata.
Scripting Languages: UNIX Shell script, Python, Perl.
Frameworks: Spring, Struts, Hibernate.
Web Services: SOAP, RESTful
Web Servers: Web Logic, Web Sphere, Apache Tomcat.
Other tools: Git, Bitbucket, Kerberos, Kubernetes, Docker, SVN, JIRA, Autosys, Maven, Jenkins, Ansible, Control-M.
PROFESSIONAL EXPERIENCE
Confidential - Charlotte, NC
Hadoop developer
Responsibilities:
- Working on implement Encryption/decryption, Tokenization/detokenization of data in hive databases.
- Work with Hive on pySpark, Scala spark API’-s to create tables, loading TB of historical data and creating Nifi ETL process for daily update from BAC- S3.
- Responsible for performance tuning of pyspark Application for setting CDC right Batch Interval correct level of parallelism and memory tuning which persists into Hive - S3.
- Responsible for elevating the code into CI/CD process using development, test, and Production environments on schedule. Provides Production support when needed.
- Responsible for data migration and handling large datasets using Partitions, Spark in-memory capabilities, broadcasts in Spark, efficient joins, Transformations during ingestion process and implemented Airflow in Spark and Hive migration job.
- Worked on various tools like the Data protection and Data discovery tools like Protegrity and Integris tools and introduced these tools to bank.
- The main objective of Data discovery’s POC is identify the Sensitive elements like credit card, ITIN, CVV, Passwords and other important information like Various NPI elements in various RDBMS and hive tables and various file formats and generate the reports.
- The scope of the Data Protection’s, testing the Protegrity tool and by using this tool, protect the Sensitive elements in various levels like data at rest, data at transit with help of Sqoop and Spark.
- Worked on Integrated the data discovery with data protection tools in windows and Unix environments and Installed Python, Java, and required libraries on servers and Configured the servers for discover the various sensitive elements.
- Able to write Data discovery code in power shell and added various regexes for CC, SSN, TIN, Passwords, and other sensitive elements to discover the sensitive elements.
- Analyzed the regexes according to sensitive elements which are present in Databases and enriched the regexes depends on various data format files.
- Worked on generating the Data Discovery reports for various Sensitive information and Protect the Sensitive elements by using Encryption/decryption, Tokenization/detokenization methods.
- Responsible to develop the Code and Unit Test and move the code to UAT and PROD.
- Responsible for Production Maintenance, Support and Upgrades and help the team in testing and deployment the project code.
Environment: Hive, Cloudera, HDFS, YARN, Kafka, S3, Impala, Sqoop, Spark, Windows PowerShell, Tomcat, Python, Shell/Unix, Agile.
Confidential, Charlotte, NC
Hadoop developer
Responsibilities:
- Played a lead role in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDINSIGHT cluster.
- Responsible for managing data coming from disparate data sources.
- Experience in ingesting incremental updates from structured ERP systems residing on Microsoft SQL server database on to Hadoop data platform using SQOOP.
- Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
- Responsible for transporting, and processing real-time stream data sourced from Magento and Form site APIs for inventory management using NIFI, Kafka and Storm.
- Experience in working with Restful APIs.
- Created HBase tables to store various data formats coming from different applications.
- Developed scripts for extracting and processing EDI POS sales data sourced from SFTP server in Hive data warehouse using Linux shell scripting.
- Implemented proof of concept to analyze the streaming data using Apache Spark with Scala; Used Maven/SBT for build and deploy the Spark programs.
- Responsible for building Confidential data cube using SPARK framework by writing Spark SQL queries in Scala to improve efficiency of data processing and reporting query response time.
- Developed spark programming code in SCALA on INTELLIJ IDE using SBT tools.
- Performance tuning of SQOOP, Hive and Spark jobs.
- Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time.
- Developed OOZIE workflows to automate ETL process by scheduling multiple SQOOP and HIVE and Spark jobs.
- Daily Monitoring of Cluster status and health using AMBARI UI.
- Experience in rendering and delivering reports in desired formats by using reporting tools such as Tableau.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Responsible for programming code independently for intermediate to complex modules following development standards.
- Planned and conducted code reviews for changes and enhancements that ensure standards compliance and systems interoperability.
- Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
Environment: Hadoop Stack, Java, Sqoop, Hive, ATSCALE, Oozie, Microsoft SQL server, Kafka, Storm, Ubuntu, HBASE, YARN, Hortonworks, UNIX, Shell Scripting.
Confidential, Mc Lean, VA
Responsibilities:
- Involved in loading and transforming large sets of structured, semi-structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Moved all crawl data flat files generated from various retailers to HDFS for further processing.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Wrote MapReduce jobs using Java API.
- Expertise in using core Java, J2EE, JDBC, Shell Scripting and proficient in using Java API's Collections, Servlets, JSP for application development.
- Experience in Java, J2ee, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JSON, XML
- Involved in Building the CI/CD Pipelines and automating the pipeplines
- Processed Huge Data with Apache Beam on Google cloud Runner. written the apache PIG scripts to process the HDFS data.
- Involved in Migrating the Hive Data to Google BigQuery
- Implementing the Automatic workflows with apache Airflow and Integrated the scripts with Jenkins.
- Implemented Java apache Beam to process Data.
- Used Jenkins to deploy code to Google Cloud with new namespaces, create Docker images and push them to container registry of Google Cloud.’
- Expertise in documenting and deployment process and high-level preparation of Release notes, Checklists, Quality process docs, Analysis docs, configuration docs with versions.
- Lead many formal and informal sessions to educate the issues of security and the importance of best practices in GCP.
- Expertise in designing the Google Cloud architecture by following the financial regulations from security point of view.
- Used Spark which is an open source MVC framework for creating elegant, modern java web applications.
- Build and configure a virtual data center in the Google cloud to support Enterprise Data Warehouse hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, Route Tables, Elastic Load Balancer.
- Created HIVE table to store the processed results in tabular format.
- Involved in Building multitenant solutions using Python and internal tools, delivering complex cloud platforms.
- Developed Java applications using various IDEs like Spring Tool Suite and Eclipse.
- Developed the sqoop scripts to make the interaction between pig and MySQL database.
Environment: Hadoop, Hive, Map Reduce, Sqoop, Kafka, Spark, Airflow,GCP
Confidential, Michigan
Hadoop/Spark Developer
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Developed and executed shell scripts to automate the jobs.
- Wrote complex Hive queries and UDFs.
- Worked on the core and Spark SQL modules of Spark extensively.
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Experienced in defining job flows using Oozie.
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica 8.6.1.
- Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Conducted Scrum Daily stand up, Product backlog, Sprint Planning, Sprint Review & Sprint Retrospective meetings.
- Involved with reporting team to generating reports from Data Mart using Cognos.
- Used spring config server for centralized configuration and Splunk for centralized logging. Used Concourse and Jenkins for Microservices deployment.
- Implemented Zipkins for distributed Micro Service Monitoring. Integrated Swagger UI and wrote integration test along with REST document.
- Creating RFP microservice to provide restful API utilizing spring boot with microservices.
- Built microservices with spring boot to serve multiple applications across the organization. The data are provided & consumed in JSON.
Environment: Apache Hadoop, EDW, SQL Server 2005, TOAD, Rapid SQL, Oracle 10g, HDFS, Map Reduce, VMware, HIVE, PIG, Hive, HBase, Sqoop, Flume, UNIX, DB2.
Confidential, Washington
Java Developer
Responsibilities:
- Involved in different phases of Software Development Lifecycle (SDLC) like Requirements gathering, Analysis, Design and Development of the application.
- Involved in designing and implementation of MVC design pattern using Spring framework for Web-tier.
- Worked on the Web Services using SOAP and RESTful web services.
- Involved in developing the user interface using Struts.
- Wrote several Action Classes and Action Forms to capture user input and created different web pages using JSTL, JSP, HTML, Custom Tags, and Struts Tags.
- Designed and developed Message Flows and Message Sets and other service component to expose Mainframe applications to enterprise J2EE applications.
- Used standard data access technologies like JDBC and ORM tool like Hibernate.
- Worked on various client websites that used Struts 1 framework and Hibernate.
- Wrote test cases using JUnit testing framework and configured applications on WebLogic Server.
- Involved in writing stored procedures, views, user-defined functions, and triggers in SQL Server database for Report’s module.
Environment: Java, Spring MVC, Struts, RESTful, JSP, JUnit, Eclipse, JIRA, JDBC, Struts 1, Hibernate, WebLogic, Oracle 9i.
