Sr. Hadoop, Spark Developer Resume
Menomonee, Wi
SUMMARY:
- IT Professional with 8+ years of extensive experience in all phases of Software Development Life Cycle (SDLC) with expertise in data analysis, design, development, testing and deployment of software systems.
- 4+ years of hands - on experience with BigData and Apache Hadoop Ecosystem components like Map-Reduce, HDFS, Sqoop, Flume, Spark, Spark Streaming Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
- Experience on different flavors of Hadoop distributions, which includes Cloudera (CDH4&5 Distributions) and Hortonworks.
- Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
- Expertise in installation, configuration, support and managing of Big Data and underlying infrastructure of Hadoop Cluster
- Experience on Cloud computing infrastructure such as EC2 Cloud Computing with AWS.
- Experience in Partitioning, Bucketing, Join Optimizations and Query Optimizations in Hive and automating the Hive Queries with the Dynamic Partitioning.
- Experience and Good understanding of NoSQL databases like Cassandra and Mongo DB
- Hands on experience on SPARK, Spark Streaming, SCALA. Creating the Data Frames handle in SPARK with Scala
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver efficient results.
- Proficient in ETL tools for Designing Data warehouse, Business Intelligence, Analytics, Data Mining, Data Mapping, Data conversion, Data Migrations and Transformations from Source to Target Systems
- Experienced in using Kafka as a distributed publisher-subscriber messaging system.
- Experience in Continuous integration and automated deployment and management using Jenkins and UDeploy.
- Diverse experience in working with variety of Database like Oracle, MySQL, IBM DB2 and Netezza.
- Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven and IntelliJ.
- Extensive Experience in creating Tableau Dashboards using Stack Bars, Bar Graphs, and geographical maps.
- Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
- Experience in developing Client-Side Web applications using Core Java and J2EE technologies such as HTML, JSP, jQuery, JDBC, Hibernate and Custom Tags while implementing the client-side validations using JavaScript and Server-side validations using Struts and Spring Validations Framework.
- Versatile team player with excellent communication, project management, documentation, interpersonal skills with ability to adapt to rapidly changing environment and quickly learn new technologies.
TECHNICAL SKILLS:
Big Data: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Spark, Spark Streaming, Storm, Kafka, YARN, Zookeeper, HBase, Impala, Cassandra
Hadoop Distributions: Cloudera, Hortonworks and MapR
Databases: SQL Server, MySQL, Oracle, Netezza
Languages: Java, C, HTML Scala, SQL, PL/SQL, UNIX Shell Script, Python
JEE Technologies: JSP, JDBC
FRAME WORKS: MVC Struts, spring, Hibernate
Build Tools: SBT, Maven and Gradle
IDE s: Eclipse, InteliJ
CI Tools: Hudson/Jenkins, NetBeans
Cloud Solutions: AWS EMR, S3
Version Control / Configurations: GIT, SVN, CVS
Defects Triage: JIRA and Bugzilla
Operating Systems: Windows, UNIX, LINUX, Ubuntu, Cent OS
Packages: MS Office Suite, MS Visio, MS Project Professional
File Formats: Avro, JSON, Parquet, Sequence, XML, CSV
Reporting Tools: Tableau
PROFESSIONAL EXPERIENCE:
Confidential, Menomonee, WI
Sr. Hadoop, Spark Developer
Roles & Responsibilities:
- Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
- Worked on analyzing Hadoop 2.7.2 cluster and different Big Data analytic tools including Pig 0.16.0, Hive 2.0 HBase 1.1.2 database and SQOOP 1.4.6
- Implemented Spark 2.0 using Scala 2.11.8 and Spark SQL for faster processing of data.
- Implemented algorithms for real time analysis in Spark.
- Used Spark for interactive queries, processing of streaming data and integration
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
- Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
- Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes were done by Commissioning process in Hadoop.
- Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Used IMPALA for querying the HDFS data.
- Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
- Services like EC2 and S3 for small data sets.
- Ingested data from RDBMS and performed data transformations, and then export to Cassandra.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Used Apache Kafka to get the data from Kafka producer which in turn pushes data to broker.
- Wrote robust/reusable Hive Scripts and UDF's in Hive using Java.
- Worked on Test Driven Development (TDD) and acceptance- test using Behave.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Designed and built unit tests and executed operational queries on HBase.
- Built Apache Avro schemas for publishing messages to topics and enabled relevant serializing formats for message publishing and consumption.
- Connected Tableau from client end with AWS IP addresses and view the end results.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Implemented a script to transmit information from Oracle to HBase using Sqoop.
- Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Wrote MapReduce programs to convert text files into AVRO and loading into Hive (Hadoop) tables
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Performed real time analysis on the incoming data using Pig, Hive and Map Reduce.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce and loaded final data into HDFS.
- Loaded data into HBase using Bulk Load and Non-bulk load.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Imported the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka to store data into HDFS.
- Connected Tableau from client end with AWS IP addresses and view the end results.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Implemented a script to transmit information from Oracle to HBase using Sqoop.
- Worked on migrating MapReduce programs into Spark transformations using Spark
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Hadoop, HDFS, MapReduce, YARN, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Scala, Java, SQL Scripting, Linux Shell Scripting, Eclipse, AWS, HBase, AVRO, Oracle, Unix, Tableau
Confidential, Atlanta, GA
Sr. Hadoop Developer
Roles & Responsibilities:
- Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
- Developed Map-Reduce programs to clean and aggregate the data.
- Responsible for building scalable distributed data solutions using Hadoop and Spark
- Worked hands on with ETL process using Java
- Implemented Hive Ad-hoc queries to handle Member data from different data sources such as Epic and Centricity.
- Implemented Hive UDF's and did performance tuning for better results.
- Analyzed the data by performing Hive queries and running Pig Scripts.
- Involved in loading data from UNIX file system to HDFS
- Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
- Used Sqoop to import and export the data from Netezza and Oracle DB into HDFS and HIVE.
- Implemented POC to introduce Spark Transformations.
- Worked with NoSQL database HBase, MongoDB to create tables and store data
- Handled importing data from various data sources, performed transformations using Hive and Map Reduce, streamed using Flume and loaded data into HDFS
- Worked in transforming data from map reduce into HBase as bulk operations.
- Implemented CRUD operations on HBase data using thrift API to get real time insights.
- Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
- Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in data ingestion into HDFS using Apache Sqoop from a variety of sources using connectors like JDBC and import parameters
- Coordination with Hadoop Admin's during deployment to production
- Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre- process data for analysis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Continuously monitoring and managing the Hadoop cluster through Cloudera Manager
- Participated in design and implementation discussion for the developing Cloudera 5 Hadoop eco system.
- Used JIRA and Confluence to update tasks and maintain documentation.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
- Created final reports of analyzed data using Apache Hue and Hive Browser and generated graphs for studying by the data analytics team.
- Used SQOOP to export the analyzed data to relational database for analysis by data analytics team.
Environment: Hadoop, Cloudera Hadoop, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, JSON, Spark, HDFS, YARN, Oozie Scheduler, Zookeeper, Mahout, Linux, UNIX, ETL, MySQL
Confidential, Birmingham, AL
Sr. Hadoop Developer
Roles & Responsibilities:
- Design, Installation and Configuration of Flume, Hive, Pig and Oozie on the Hadoop Cluster.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Import data from open data sources into Amazon S3 and other private clusters.
- Developed scripts to automate the creation of Sqoop Jobs for various workflows.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- Using HiveQL developed many queries and extracted the business required information.
- Developed scripts to automate the creation of hive tables and partitions.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Developed MR jobs for analyzing the data stored in the HDFS by performing map-side joins, reduce-side joins.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Designed and developed the framework to log information for auditing and failure recovery.
- Design & Develop ETL workflow using Oozie for business requirements, which includes automating the extraction of data from MySQL database into HDFS using Sqoop.
Environment: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume
Confidential, Columbus, OH
Sr. ETL Data Stage Developer
Roles & Responsibilities:
- Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
- Created Sequencers at job level to include multiple jobs and a layer level sequence which include all job level sequences.
- Involved in the designing of marts and dimensional and fact tables.
- Extensively used Parallel Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
- Worked on with ETL process using Python and Java
- Migrated the jobs from 7.5 to 8.1 and developed new DataStage jobs using data stage/quality stage designer Imported and exported repositories across projects.
- Used DataStage Designer for developing jobs and DataStage Director to view the log file for execution errors.
- Created DataStage Parallel Jobs to Fact and Dimension Tables.
- Wrote Shell Scripts to run data stage jobs, PL/SQL blocks.
- Wrote SQL queries for checking the data from Source system as well as Staging.
- Used Parallel Extender for splitting the data into subsets, utilized Lookup, Sort, Merge and other stages to achieve job performance.
- Worked on DataStage tools like DataStage Designer and DataStage Director for developing the jobs and view the log for errors.
Environment: IBM WebSphere Data stage 8.1, DataStage 7.5.2, Python, UNIX Shell Scripting (Korn /KSH), SQL, Oracle 9i/10g, UNIX and Windows XP
Confidential, Kalamazoo, MI
Sr. Java Developer
Roles & Responsibilities:
- The application was developed in J2EE using an MVC based architecture.
- Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
- Created tile definitions, Struts-Config files, validation files and resource bundles for all modules using Struts framework.
- Wrote prepared statements and called stored Procedures using callable statements in MySQL.
- Executed SQL queries to perform crud operations on customer records.
- Gathered requirements and then developed complex workflows which involved Templates. Open Deploy.
- Used Eclipse 6.0 as IDE for application development Configured Struts framework to implement MVC design patterns.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Designed and developed GUI using JSP, HTML, DHTML and CSS. Worked with JMS for messaging interface.
- Used Hibernate for handling database transactions and persisting objects deployed the entire project on WebLogic application server.
- Part of the team involved in the design and coding of the Data capture templates, presentation & component templates.
- Developed and configured templates to capture and generate multi-lingual content. With this approach North US branch content is encoded in BIG5.
- Used Apache web sphere as the application server for deployment.
- Used Web services for transmission of large blocks of XML data over HTTP.
Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, WebLogic 8.0, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, Log4j, Eclipse 6.0