Senior Big Data/ Hadoop Developer Resume
SUMMARY
- Over 8 years of IT relevant experience as a Hadoop Developer in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
- Experience with the Hadoop and related Big Data tools, including Spark, Hive, Kafka, Apache Mesos, Cascading and Hadoop MapReduce using Scala, Python, and Java.
- Have solid Background working on DBMS technologies such as Oracle, MY SQL, NoSQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
- Experienced in successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
- Developed automated scripts using Unix Shell for running Balancer, file system, Schema Creation in Hive and User/Group creation on HDFS.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big data, Hadoop architecture using Map Reduce programming paradigm.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Experience with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD and Spark YARN.
- Experience in deploying and managing the multi node development, testing and production of Hadoop clusters with different Hadoop components using Hortonworks Ambari.
- Experience in Cloudera Hadoop Upgrades, Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked in provisioning and deploying multi-tenant Hadoop clusters on public cloud environment Amazon Web Services (AWS) and on private cloud infrastructure using various AWS components.
- Experienced in working with Amazon Web Services using EC2 for computing and S3 as Storage Mechanism.
- Have Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.
- Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
- Worked on different IDE tools like Oracle JDeveloper, WebLogic Workshop, Net Beans and Eclipse.
- Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
- Expertise in support activities including installation, configuration and successful deployment of changes across all environments.
TECHNICAL SKILLS
Operating Systems: Windows, Linux distributions like Ubuntu, CentOS, RHEL
Data stores: Oracle, NoSQL, MySQL, Hbase, Cassandra, MongoDB
Big data: Map Reduce, HDFS, Flume, Hive, Pig, Oozie, YARN, Horton works, Cassandra, Hadoop, Kafka, Flume, Sqoop, Impala, Zookeeper, Spark, Ambari, Mahout, MongoDB, Avro, Storm and Parquet.
ETL: Talend Open studio, Informatica
Programming Languages: R, Scala, Python, C, Javascript and Nix tools, HTML, Bash, Perl, SQL, DHTML, XML and C++,C#.
No SQL Databases: MongoDB and HBase
Cloud Services: AWS EMR, S3, Azure, Docker, Kubernetes
Application Servers: Web logic 11g, 12c, Tomcat 5.x and 6.x
PROFESSIONAL EXPERIENCE
Confidential
Senior Big Data/ Hadoop Developer
Responsibilities:
- Developed and analyzed large volumes of customer data for marketing campaigns, sales management, inventory management etc.
- Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestions converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
- Written various MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats and store the refined data in partitioned tables in the EDW.
- Developed Hive SQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE and Impala to read, write.
- Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
- Used Pig as ETL tool to do transformations, event joints and some pre-aggregations before storing the analyzed data into HDFS.
- Have been working with AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Work Spaces, Lambda, Kinesis, RDS, SNS, SQS).
- Have been a part of a team that has taken care of setting the infrastructure in AWS.
- Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
- Developed multiple POCs using PySpark and deployed on the Yarncluster, compared the performance of Spark, with Hive and SQL/Teradata to see any performance lag.
- Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Experience in using PentahoData Integration tool for data integration, OLAP analysis and ETL process.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Developed bash scripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
- Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meetings with UAT team and QA team to ensure defects are resolved in a timely manner.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
- Continuously monitored and managed the Hadoop Cluster using Hortonworks Ambari.
Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Data warehouse, MapReduce, pig, Hive, Sqoop, Spark, Scala, Oozie, Hortonworks, Java, Oracle 10g, Python, MongoDB, Shell and bash Scripting.
Confidential
Hadoop Developer
Responsibilities:
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Responsible for building scalable distributed data pipelines using Hadoop.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Involved in review of functional and non-functional requirements.
- Data ingestion into HDFS from various Mainframe Db2 tables using Sqoop and Apache Kafka for tracking data ingestion to Hadoop cluster.
- Developed Hadoop streaming Map/Reduce works using Python.
- Sound knowledge in programming Spark using Scala and developed Spark scripts by using Scala as per the requirement.
- Wrote Pig scripts to debug Kafka hourly data and perform daily roll ups.
- Data Migration from existing Teradata systems to HDFS and build datasets on top of it.
- Built a framework using SHELL scripts to automate Hive registration, which does dynamic table creation and automated way to add new partitions to the table.
- Designed Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning, and buckets.
- Setup and benchmarked Hadoop/HBase clusters for internal use. Developed Simple to complex MapReduce programs.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Oozie workflows that chain Hive/MapReduce modules for ingesting periodic/hourly input data.
- Wrote Pig & Hive scripts to analyze the data and detect user patterns.
- Implemented Device based business logic using Hive UDFs to perform ad-hoc queries on structured data.
- Storing and loading the data from HDFS to Azure and backing up the Namespace data into NFS Filers.
- Prepared Avro schema files for generating Hive tables and shell scripts for executing Hadoop commands for single execution.
- Continuously monitored and managed the Hadoop cluster by using Cloudera Manager.
- Worked with the administration team to install the operating system, Hadoop updates, patches, version upgrades as required.
- Developed ETL pipelines to source data to Business intelligence teams to build visualizations.
- Involved in unit testing, interface testing, system testing and user acceptance testing of the workflow Tool.
Environment: Azure, Unix, GIT, Chef, Jira, Nagios, Tomcat, Jenkins, SAN, Virtualization, Windows and Linux Operating Systems, Workflow & Approvals, ITSM remedy, Reports, Network Protocols, SQL Database and Monitoring Tools.
Confidential
Hadoop Developer
Responsibilities:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
- Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Enhanced address book application developed using AngularJS, destroy unwanted watches, separate business logic to services. Designed backbone collection to offer a combination of promotions for given SKUs.
- Developed SQL queries and stored procedures.
- Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
- Used Junit Framework for the unit testing of all the java classes.
- Implemented various J2EE Design patterns like Singleton, Service Locator and SOA.
- Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.
Environment: J2EE, JDBC, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, Web Logic, XML, Junit, Oracle, Web Sphere, Eclipse.
Confidential
Software Engineer
Responsibilities:
- Designed use cases, activities, states, objects and components.
- Developed the UI pages using HTML, DHTML, Java script, Ajax, jQuery, JSP and tag libraries.
- Developed front-end screens using JSP and Tag Libraries.
- Performing validations between various users.
- Design of Java Servlets and Objects using J2EE standards.
- Coded HTML, JSP and Servlets.
- Developed an internal application using Angular and Node.js connecting to Oracle on the backend.
- Coding xml validation and file segmentation classes for splitting large XML files into smaller segments using SAX Parser.
- Created new connections through application coding for better access to DB2 database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
- Implemented application using Struts MVC framework for maintainability.
- Involved in testing and deploying in the development server.
- Wrote oracle stored procedures (PL/SQL) and called it using JDBC.
- Involved in the design tables of the database in Oracle. Involved in the design tables of the database in Oracle.
Environment: Java1.7 J2EE, Apache Tomcat, CVS, JSP, Servlets, Struts, PL/SQL and Oracle.