We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Atlanta, GA

SUMMARY:

  • 7 +years of IT experience in full life cycle (SDLC) of the software development process including requirement gathering, analysis, design, development, writing technical specifications, and interface development using Object Oriented Methodologies and RDBMS like MySQL, Oracle.
  • More than 4 years of experience in developing and administering Hadoop and Big Data technologies like Hadoop HDFS, Map - Reduce, Pig, Hive, Oozie, Flume, HCatalog, Sqoop, zookeeper and NoSQL like Cassandra .
  • Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), Java Server Pages (JSP), Java Servlets, and Java database Connectivity (JDBC) technologies.
  • In depth and extensive knowledge of Hadoop Architecture and various components such as HDFS, MapReduce, Name Node, Data Node, Secondary Name Node, Job Tracker, Task Tracker.
  • Experience in Designing, developing and deploying Hadoop eco system technologies such as HDFS, Zookeeper MapReduce, Pig, Hive, Oozie, Flume, Hue and Sqoop.
  • Implemented MapReduce 2.0 (MRv2) or YARN on the Hadoop cluster.
  • Experience in deploying Spark to run with YARN and connect to Hive Server to run Spark-SQL commands.
  • Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files by using SerDes.
  • Has high-end technical driven experience in analyzing data by writing Hadoop MapReduce jobs using Java.
  • Expertise in optimization of MapReduce algorithms using Combiners, Partitioners and Distributed Cache to deliver best results.
  • Hands on experience in analyzing data by developing custom UDF's and scripts in Pig.
  • Experience in using HCatalog to transfer data in between Pig and Hive.
  • Hands on experience in implementing and optimizing the queries using HiveQL by implementing partitioning and custom UDF's.
  • Experience in importing streaming logs and aggregating the data to HDFS through Flume.
  • Hands on experience in scheduling Oozie workflow engine to run multiple Flume, MapReduce, Hive and pig jobs.
  • Experience in working with relational databases and integrating with Hadoop infrastructure and pipe the data to HDFS using Sqoop.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Hands on experience in writing custom Shell Scripts for system management and to automate redundant tasks.
  • Experience in writing Python, Shell and Java scripts to monitor the Cluster usage metrics.
  • Experience in integrating Hadoop job workflows using Pentaho Data Integration tool.
  • Experience in integrating Hadoop with Enterprise BAtools using Tableau and Pentaho Business Analytics tool to generate reports.
  • Experience in installing and Customizing Pentaho Data Integration and Pentaho Business Analytics tools to use Active Directory, LDAP and Custom Header authentication.
  • Hands on experience in implementing Junit, Mrunit and Pig unit test cases.
  • Experience in managing Maven Plug-in in Eclipse for programming.
  • Hands on experience in scheduling the jobs through Tidal Enterprise Scheduler (TES).
  • Experience in Installing, configuring and loading data to Elasticsearch Cluster.
  • Experience in installing the Static Kibana server to connect to Elasticsearch cluster and generate reports.
  • Well versed knowledge and experience in public cloud environment - Amazon Web Services (AWS), Rackspace and on private cloud infrastructure - OpenStack cloud platform.
  • Experience in writing OpenStack Heat Templates in YAML to auto deploys Hadoop cluster servers with pre-cluster configurations including the required tools like Hive, Pig, Sqoop, and HBase etc.
  • Hands on Knowledge on writing and submittting jobs in Amazon Elastic Map Reduce.
  • Experience in source control using version control tools like GIT, SVN.
  • Working experience in configuring and working with automation tools Run Deck.
  • Working experience and configuration of continuous integration tool Jenkins.
  • Hands on Experience in Load Balancing by configuring using Apache Load balancers with mod proxy.
  • Good programming experience inSQL, PL/SQL, Complex Stored Procedures and Triggers.
  • Experience in working with multiple RDBMS including Oracle 11i/9i/8i, PostgreSQL, SQL Server and MS Access.
  • Experienced in installing, configuring, and administrating multi-node Hadoop cluster of major Hadoop distributions using Cloudera, MapR and Hortonworks.
  • Hands on experience in doing Capacity planning and Cluster building based on the requirement.
  • Fluid understanding of multiple programming languages, including JAVA, C, C++, HTML, XML, and PHP.
  • Ambitious self-starter who plans, prioritizes, and manages multiple technical tasks within deadline-driven environments.
  • Experience in Installing and monitoring standalone multi-node Clusters of Kafka and Storm.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Successfully secured the Kafka cluster with Kerberos
  • Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
  • Excellent communications skills. Adept at building strong working relationships with coworkers and management

TECHNICAL PROFICIENCIES:

Big Data Distributions: Apache Hadoop, Cloudera Manager, Cloudera Hadoop Distributions

Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, Flume,Sqoop,Oozie,Zookeeper, Spark

Monitoring Tools: Ganglia, Nagios, Cloudera Manager, DataStax OpsCenter

Security: Kerberos

Databases: Oracle, MySQL, PostgreSQL

NoSQL Database: Cassandra, HBase, MongoDB

Languages: Shell Scripts,C,C++,Java, Java Script, JSP, HTML, XML,SQL.

Technologies: J2EE,JDBC, Servlets,JSP, WebServices, XML

Tools: Eclipse, NetBeans IDE, Apache Tomcat,Visual basic, Microsoft Office Suite, Adobe Dreamweaver, Stash, GitHub

Operating Systems: Windows XP/7/8 Macintosh, Ubuntu, Linux CentOS6

WORK EXPERIENCE:

Confidential, Atlanta GA

Hadoop/Spark Developer

Responsibilities:

  • Hands on experience in working with SPARK & SPARK SQL.
  • Worked in querying data using Spark SQL on top of Spark engine.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Developed HQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Sqoop to transfer data between DB2 and HDFS.
  • Wrote transformations for various business logics in using Hive SQL using spark.
  • Involved in loading .csv files into hive databases according to the business logics.
  • Wrote transformations for about 160 columns tables and upset this into DB2 database using spark.
  • Experienced in querying data using Spark SQL on top of Spark engine.
  • Created and worked on large data frames with a schema of more than 300 columns
  • Created strongly typed datasets.
  • Wrote functions whenever required to make column validations, data cleansing as required to achieve logics in Scala
  • Created UDF’s when required and registering to use throughout application
  • Experienced in querying data using Spark SQL on top of Spark engine.
  • Working with various file formats like parquet orc, Avro
  • Develop quality code adhering to Scala coding Standards and best practices.
  • Experienced in performance tuning of Spark jobs for setting right Batch Interval time, correct level of Parallelism and memory tuning, changing the configuration properties and using broadcast variables.
  • Involved in hive data cleansing using eclipse ide like trimming data, joining columns, performing aggregations on columns like percentages, trimming leading zero’s columns in hive tables etc. performing transformations on top of columns and storing data into hive databases.
  • Writing case statements using HiveQL
  • Used Spark for Parallel data processing and better performance.
  • Experience in developing scalable solutions using NoSQL databases Cassandra.
  • Involved in writing transformations for hive tables using spark and upsetting it to DB2.
  • Working closely with Cassandra loading activity on history load and incremental loads from Oracle Databases and resolving loading issues and tuning the loader for optimal performance
  • Experience in migrating several data bases from on premise data center to Cassandra.

Environment: s: Hive, Hadoop cluster, Sqoop, Spark 2.1.1, Scala, Eclipse Scala IDE, Cassandra, Toad DB2

Confidential, Franklin, WI

Senior Big data Analyst/Developer

Responsibilities:

  • Analysis the Existing system process.
  • Have prepared design documents for the above specified models.
  • Have done the implementation for data preparation, scoring and trend analysis.
  • Have developed common export framework to transfer the data for different target systems (COM, EXE)
  • Have Prepared the in-house Comparator tool using MapReduce for (Data Science and Engineering team output data validation)
  • Leads quality assurance reviews, code inspections, and walkthroughs of the developers' code
  • Acts as technical interface to development team for external groups
  • Provide the training for team members and cross team members.
  • Have prepared validation script to check source and target data validation post ingested.
  • Have implemented scoring logic using Python script and hive script.
  • Have created and configure coordinator, workflow and bundles in oozie.
  • Have deployed jar file in EC2 instance post development.
  • Have worked 62 Nodes physical cluster in hadoop1x and 31 Nodes in hadoop2x Yarn.
  • Have worked 10 Nodes cluster in AWS for Dev & QA Environment.
  • Have involved in setting up IAM identity access manager role.
  • Have involved network set up in physical cluster with admin.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and HBase, DB2, Teradata, Linux

Confidential, Bloomington, IL

Hadoop Developer

Responsibilities:

  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Deployed and analyzed large chunks of data using HIVE as well as HBase.
  • Provide support data analysts in running Pig and Hive queries.
  • Used HIVE, Python at various stages of the project lifecycle.
  • Create business intelligence dashboards in Tableau for reconciliation and verifying data
  • Re-designed and developed a critical ingestion pipeline to process over 200 TB of data.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Importing and exporting Data from MySQL/Oracle to HDFS.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them. Exported the result set from Hive to MySQL using Shell scripts.
  • Created models and customized data analysis tools in Python and MATLAB
  • Delivered data analysis projects using Hadoop based tools and the python data science stack, Developed new data analysis and visualization in python
  • Handled importing of data from various data sources, performed transformations using and Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Gained good experience with NOSQL database.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Involved in creating tables, partitioning, bucketing of table.
  • Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/Reduce.
  • Interacted with the Business users to identify the process metrics and various key dimensions and measures. Involved in the complete life cycle of the project.
  • Created Mapplets, reusable transformations and used them in different mappings. Created Workflows and used various tasks like Email, Event-wait and Event-raise, Timer, Scheduler, Control, Decision, Session in the workflow manager.
  • Made use of Post-Session success and Post-Session failure commands in the Session task to execute scripts needed for clean up and update purposes.
  • Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
  • Worked with reporting team to help understand them the user requirements on the reports and the measures on them.
  • Migrated repository objects, services and scripts from development environment to production environment. Extensive experience in troubleshooting and solving migration issues and production issues.
  • Actively involved in production support. Implemented fixes/solutions to issues/tickets raised by user community.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and HBase, Informatica Power Center, DB2, Teradata, UNIX, Tableau.

Confidential, CA

Hadoop Admin

Responsibilities:

  • Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Worked on streaming the data into HDFS from web servers using flume.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
  • Responsible for collecting Data required for testing various Map Reduce applications from different sources.
  • Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Wrote pig scripts for advanced analytics on the data for recommendations.
  • Designed and implemented Pig UDF's for evaluation, filtering, loading and storing of data.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Implemented unit testing with the help of Mrunit and Junit tools.
  • Debugging and troubleshooting the issues in development and Test environments.
  • Worked with BI teams in generating the reports on Tableau.
  • Involved in Minor and Major Release work activities.

Environment: HDFS, Oozie, Sqoop, Pig, Hive, Flume, Shell scripting, MapReduce, Eclipse, Tidal, Stash

Confidential, Portland, ME

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, hive and Sqoop.
  • Implemented Sqoop to transfer data between RDBMS databases and HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on streaming the data into HDFS from web servers using Flume.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a map reduce way.
  • Worked on Implementing Hive Custom UDF's and UDAF's in java to process and analyze data.
  • Worked on custom Pig Loaders and Storage classes to work with semi-structured data and unstructured data.
  • Supporting Hadoop developers and assisting in optimization of MapReduce jobs, Pig Latin scripts and HiveScripts.
  • Automating jobs using Oozie workflow engineto chain together Shell scripts, Flume, MapReduce jobs, Hive and pig scripts
  • Used Tidal enterprise scheduler to schedule and automate the daily jobs.
  • Implemented unit testing with the help of Mrunit and Junit tools.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issueswith Hive, Pig, Flume etc.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Developed test plans, test scripts and test environments to understand and resolve defects.

Environment: MapReduce, Oozie, Sqoop, Pig, Hive, Flume, Shell scripting, Eclipse, Tidal, GitHub

Confidential

Associate Java Developer

Responsibilities:

  • Participated in planning and development of UML diagrams like use case diagrams, object diagrams and class diagrams to represent a detailed design phase.
  • Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
  • Used JavaScript for developing client side validation scripts.
  • Developed SQL scripts for batch processing of data.
  • Created and implemented stored procedures, functions, triggers, using SQL.
  • Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application.
  • Performed unit testing, system testing and user acceptance test.
  • Worked with QA to move the application to production environment.
  • Prepared technical reports & documentation manuals during the program development.

Hire Now