Senior Big Data/ Hadoop Developer Resume
SUMMARY
- Over 8 years of IT relevant experience as a Hadoop Developer in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
- Experience with teh Hadoop and related Big Data tools, including Spark, Hive, Kafka, Apache Mesos, Cascading and Hadoop MapReduce using Scala, Python, and Java.
- Has solid Background working on DBMS technologies such as Oracle, MY SQL, NoSQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
- Experienced in successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
- Developed automated scripts using Unix Shell for running Balancer, file system, Schema Creation in Hive and User/Group creation on HDFS.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big data, Hadoop architecture using Map Reduce programming paradigm.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Proficient in configuring Zookeeper, Cassandra & Flume to teh existing Hadoop cluster.
- Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Experience with Spark to improve teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD and Spark YARN.
- Experience in deploying and managing teh multi node development, testing and production of Hadoop clusters with different Hadoop components using Hortonworks Ambari.
- Experience in Cloudera Hadoop Upgrades, Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked in provisioning and deploying multi-tenant Hadoop clusters on public cloud environment Amazon Web Services (AWS) and on private cloud infrastructure using various AWS components.
- Experienced in working with Amazon Web Services using EC2 for computing and S3 as Storage Mechanism.
- Has Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.
- Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
- Worked on different IDE tools like Oracle JDeveloper, WebLogic Workshop, Net Beans and Eclipse.
- Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
- Experience in using Sqoop to import teh data on to Cassandra tables from different relational databases.
- Expertise in support activities including installation, configuration and successful deployment of changes across all environments.
TECHNICAL SKILLS
Operating Systems: Windows, Linux distributions like Ubuntu, CentOS, RHEL
Data stores: Oracle, NoSQL, MySQL, Hbase, Cassandra, MongoDB
Big data: Map Reduce, HDFS, Flume, Hive, Pig, Oozie, YARN, Horton works, Cassandra, Hadoop, Kafka, Flume, Sqoop, Impala, Zookeeper, Spark, Ambari, Mahout, MongoDB, Avro, Storm and Parquet.
ETL: Talend Open studio, Informatica
Programming Languages: R, Scala, Python, C, Javascript and Nix tools, HTML, Bash, Perl, SQL, DHTML, XML and C++,C#.
No SQL Databases: MongoDB and HBase
Cloud Services: AWS EMR, S3, Azure, Docker, Kubernetes
Application Servers: Web logic 11g, 12c, Tomcat 5.x and 6.x
PROFESSIONAL EXPERIENCE
Confidential
Senior Big Data/ Hadoop Developer
Responsibilities:
- Developed and analyzed large volumes of customer data for marketing campaigns, sales management, inventory management etc.
- Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Has done monitoring and reviewing Hadoop log files and written queries to analyze them.
- Understanding teh existing Enterprise data warehouse set up and provided design and architecture suggestions converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
- Loading teh data from teh different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
- Written various MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats and store teh refined data in partitioned tables in teh EDW.
- Developed Hive SQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Written complex queries to get teh data into HBase and responsible for executing hive queries using Hive Command Line, HUE and Impala to read, write.
- Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
- Used Pig as ETL tool to do transformations, event joints and some pre-aggregations before storing teh analyzed data into HDFS.
- Has been working with AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Work Spaces, Lambda, Kinesis, RDS, SNS, SQS).
- Has been a part of a team dat has taken care of setting teh infrastructure in AWS.
- Used AmazonS3 as a storage mechanism and written python scripts dat dump teh data into S3.
- Developed multiple POCs using PySpark and deployed on teh Yarncluster, compared teh performance of Spark, with Hive and SQL/Teradata to see any performance lag.
- Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Experience in using PentahoData Integration tool for data integration, OLAP analysis and ETL process.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Developed bash scripts to bring teh TLog files from ftp server and then processing it to load into hive tables. All teh bash scripts are scheduled using Resource Manager Scheduler.
- Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
- Conducted POC's and mocks with client to understand teh Business requirement, also attended defect triage meetings with UAT team and QA team to ensure defects are resolved in a timely manner.
- Worked with Kafka for teh proof of concept for carrying out log processing on a distributed system.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating teh cluster and scheduling workflows.
- Continuously monitored and managed teh Hadoop Cluster using Hortonworks Ambari.
Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Data warehouse, MapReduce, pig, Hive, Sqoop, Spark, Scala, Oozie, Hortonworks, Java, Oracle 10g, Python, MongoDB, Shell and bash Scripting.
Confidential
Hadoop Developer
Responsibilities:
- Involved in teh implementation of design using vital phases of teh Software development life cycle (SDLC) dat includes Development, Testing, Implementation and Maintenance Support.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Analyzed teh SQL scripts and designed teh solution to implement using Pyspark
- Responsible for building scalable distributed data pipelines using Hadoop.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Involved in review of functional and non-functional requirements.
- Data ingestion into HDFS from various Mainframe Db2 tables using Sqoop and Apache Kafka for tracking data ingestion to Hadoop cluster.
- Developed Hadoop streaming Map/Reduce works using Python.
- Sound knowledge in programming Spark using Scala and developed Spark scripts by using Scala as per teh requirement.
- Wrote Pig scripts to debug Kafka hourly data and perform daily roll ups.
- Data Migration from existing Teradata systems to HDFS and build datasets on top of it.
- Built a framework using SHELL scripts to automate Hive registration, which does dynamic table creation and automated way to add new partitions to teh table.
- Designed Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning, and buckets.
- Setup and benchmarked Hadoop/HBase clusters for internal use. Developed Simple to complex MapReduce programs.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Oozie workflows dat chain Hive/MapReduce modules for ingesting periodic/hourly input data.
- Wrote Pig & Hive scripts to analyze teh data and detect user patterns.
- Implemented Device based business logic using Hive UDFs to perform ad-hoc queries on structured data.
- Storing and loading teh data from HDFS to Azure and backing up teh Namespace data into NFS Filers.
- Prepared Avro schema files for generating Hive tables and shell scripts for executing Hadoop commands for single execution.
- Continuously monitored and managed teh Hadoop cluster by using Cloudera Manager.
- Worked with teh administration team to install teh operating system, Hadoop updates, patches, version upgrades as required.
- Developed ETL pipelines to source data to Business intelligence teams to build visualizations.
- Involved in unit testing, interface testing, system testing and user acceptance testing of teh workflow Tool.
Environment: Azure, Unix, GIT, Chef, Jira, Nagios, Tomcat, Jenkins, SAN, Virtualization, Windows and Linux Operating Systems, Workflow & Approvals, ITSM remedy, Reports, Network Protocols, SQL Database and Monitoring Tools.
Confidential
Hadoop Developer
Responsibilities:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
- Developed teh entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
- Developed teh Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to teh service providers.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Enhanced address book application developed using AngularJS, destroy unwanted watches, separate business logic to services. Designed backbone collection to offer a combination of promotions for given SKUs.
- Developed SQL queries and stored procedures.
- Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
- Used Junit Framework for teh unit testing of all teh java classes.
- Implemented various J2EE Design patterns like Singleton, Service Locator and SOA.
- Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.
Environment: J2EE, JDBC, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, Web Logic, XML, Junit, Oracle, Web Sphere, Eclipse.
Confidential
Software Engineer
Responsibilities:
- Designed use cases, activities, states, objects and components.
- Developed teh UI pages using HTML, DHTML, Java script, Ajax, jQuery, JSP and tag libraries.
- Developed front-end screens using JSP and Tag Libraries.
- Performing validations between various users.
- Design of Java Servlets and Objects using J2EE standards.
- Coded HTML, JSP and Servlets.
- Developed an internal application using Angular and Node.js connecting to Oracle on teh backend.
- Coding xml validation and file segmentation classes for splitting large XML files into smaller segments using SAX Parser.
- Created new connections through application coding for better access to DB2 database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
- Implemented application using Struts MVC framework for maintainability.
- Involved in testing and deploying in teh development server.
- Wrote oracle stored procedures (PL/SQL) and called it using JDBC.
- Involved in teh design tables of teh database in Oracle. Involved in teh design tables of teh database in Oracle.
Environment: Java1.7 J2EE, Apache Tomcat, CVS, JSP, Servlets, Struts, PL/SQL and Oracle.