We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Windsor, CT

SUMMARY

  • Around 8 years of experience in IT industry with 5 years of hands - on experience in Big Data Hadoop and in Developing, Implementation and Maintenance of various applications using Java, J2EE technologies.
  • In depth knowledge in Hadoop Architecture like HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programing paradigms.
  • Strong Knowledge in configuring Name Node High Availability and Name node federation.
  • Working experience in using Apache Hadoop ecosystem components like Map Reduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase, Zoo Keeper and Impala.
  • Experience in building, maintaining and monitoring multiple Hadoop clusters of different sizes and configuration setups of Rack Topology clusters in Hadoop.
  • Experience in managing and rewriting Hadoop log files and analyzing data using Pig Latin, HiveQL, HBase and custom Map Reduce programs in Java.
  • Executing Hive and Pig core functionality by writing User Defined Functions(UDFs).
  • Experience in building, configuring, managing and supporting Cloudera’s Hadoop platform along with CDH4&5 clusters.
  • Very good understanding on MapReduce 1 Job Tracker and MapReduce 2 Yarn.
  • Expertise in Oracle, MySQL, MS SQL Server, SQLite, API, HTML, DHTML and other Internet Technologies.
  • Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, Cursors, Index, Triggers and Packages.
  • Good understanding on XML methodologies and including Web Services.
  • Experience in NoSQL databases such as HBase and Cassandra.
  • Good knowledge in programming Spark using Scala and Python.
  • Experience in using Kerberos for securing Hadoop cluster.
  • Experience in integrating AD/LDAP users with Ambari and Ranger.
  • Very good expertise in High level Linux Scripting and Python Scripting.
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment like Amazon Web Services (AWS).
  • Strong Knowledge on Software Development Life Cycle(SDLC).
  • Experienced in Agile (SCRUM) methodology and Iterative Waterfall model.
  • Experience in working with multi-cultural environment with team and individually as per project requirement.
  • Excellent interpersonal, leadership, communication and creative skills, critical thinker by nature, research minded, goal oriented with problem solving.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Flume, Impala, Oozie, Zookeeper, Amazon Web Services, Redshift, Talend Big Data Studio, Spark, Storm, Kafka, Ambari, and Scala.

Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, COBOL, REXX, CICS and Python.

SQL Database: MySQL, DB2, Oracle Database, Teradata and Sybase.

NoSQL Database: HBase and Cassandra.

Web Technologies: HTML, HTML5, XML and CSS.

Version Control Systems: Git, SVN, CVS.

Operating Systems: Linux, Windows and MacOS.

Methodologies: Agile and Waterfall.

Tools: Tableau, Eclipse, ServiceNow, MS Visual Studios, Xcode, NetBeans and Cisco Network Simulator.

PROFESSIONAL EXPERIENCE

Confidential, Windsor, CT

Hadoop/Spark Developer

Responsibilities:

  • Responsible for coordinating end to end project management related activities.
  • Involved in Design and Development of technical specification document using Hadoop.
  • Migrated the required data from Oracle, MySQL databases to HDFS using Sqoop.
  • Worked with the Architect team to design Spark model for the existing MapReduce model.
  • Designed batch processing jobs using Spark with Scala to increase speeds by ten-fold compared to that of MapReduce jobs.
  • Developed Hive and Pig UDF’s using Scala and Python.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed POC's using Scala, Spark SQL and MLib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Worked on loading data into Spark RDD and performed in-memory data computation to generate the output response.
  • Involved in development of Spark and Spark SQL scripts to migrate data from Teradata into AWS-RedShift.
  • Involved in automating the data pull from Amazon S3 to HDFS.
  • Coordinated with the team working on Impala to use some SQL queries on Hive tables.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Involved in developing Spark scripts using different Scala and Python API’s.
  • Involved in performance tuning, backup/recovery, monitoring and disaster recovery strategy and production support for Hadoop ecosystem.
  • Involved with the administration team to configure Talend and resolved different issues using Talend.
  • Worked on rectifying the failure using log files of ecosystems on Hadoop.
  • Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
  • Managed and monitored Hadoop cluster using Cloudera Manager.
  • Defined Job workflow as per their dependencies in Oozie.
  • Maintained cluster coordination process using Zookeeper.
  • Coordinated with BI team in generating the reports and designing ETL workflows on Tableau BI tool.

Environment: Hadoop(Hive, Impala, Map Reduce, HBase, Spark, Pig, Sqoop, Storm, Flume, Oozie), MySQL, Tableau, Amazon S3, AWS EC2, Scala, Talend, Java, and Python.

Confidential, East Hanover, NJ

Hadoop Developer

Responsibilities:

  • Proactively monitored systems and services, architectures design and implementation of Hadoop deployment, configuration management, backup and disaster recovery systems and procedures.
  • Worked with system engineering term to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with admin team in designing and upgrading CDH 3 to CDH 4.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on Hadoop cluster.
  • Shared responsibility in administrating Hadoop, Hive and Pig.
  • Created Hive external tables to store the Pig script output. Worked on them for the data analysts to meet the business requirement.
  • Managed and reviewed Hadoop log Files.
  • Created Hive UDF’s with MapReduce using Python.
  • Used Flume to collect, aggregate and store the web logs data from different sources like web servers, mobiles and network devices pushed to HDFS.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors, page views, visit duration and most purchased product on website per day on website.
  • Involved in complete implementation of ETL logics.
  • Troubleshooting debugging and fixing Talend specific issues, while maintaining health and performance of ETL environment.
  • Handled Active Directory integration with Ambari and Ranger.
  • Exported the analyzed data to relational databases using Sqoop for visualization using Tableau and to generate reports for BI team.
  • Developed Pig scripts to convert data from Avro to Text file format.
  • Involved in analyzing system failures, identifying root causes and recommended course of actions.

Environment: Hadoop(MapReduce, Hive, Impala, Flume, Oozie, Sqoop, Pig, Ambari, Ranger) Talend Shell Scripting, Python, Java, and Tableau.

Confidential, Tyson Corner, VA

Big Data/Hadoop Developer

Responsibilities:

  • Installed Hadoop, MapReduce, HDFS and developed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Worked on evaluation and analysis of Hadoop cluster and different Big Data analytic tools including Pig, HBase and Sqoop.
  • Understood the business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
  • Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Hands on experience on Pig and Hive User Defined Functions (UDF).
  • Execution of Hadoop ecosystem and application through Apache HUE.
  • Optimized Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.
  • Involved in setting up HBase to use HDFS.
  • Developed Hive queries to analyze reducer output data.
  • Developed PIG Latin scripts to extract data from source systems.
  • Developed Oozie workflows for the application execution.
  • Performing data migration from legacy databases RDMS to HDFS using Sqoop.
  • Responsible for developing efficient MapReduce and Hive scripts on AWS cloud programs.
  • Implementing Hive tables and HQL queries for reports. Written and used complex data types in Hive. Storing and retrieving data using HQL in Hive.
  • Used Zookeeper for providing coordination service to the cluster.
  • Involved in setting up for Kerberos over the Hadoop clusters.
  • Highly involved in designing the next generation data architecture for the unstructured data.
  • Managed and worked on live 84 node Hadoop cluster.

Environment: Hadoop(MapReduce, Hive, HBase Oozie, Pig, HUE, Sqoop, Flume), Java, Shell Scripting, Kerberos and DB2.

Confidential, Hartford, CT

Hadoop Developer

Responsibilities:

  • Responsible for understanding the scope of the project and gathering requirement.
  • Involved with application team to install operating system, Hadoop updates, patches and version upgrades as required.
  • Handled importing from multiple data sources using Sqoop performed transformations using Hive and MapReduce to load data into HDFS.
  • Work done deploying Hadoop cluster with multiple node and different Big Data analytic tools including Pig, HBase and Sqoop.
  • Created a data warehouse using Hive.
  • Analyzed large data sets using Hive queries and Pig scripts.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Involved in creating Hive tables, loading and analyzing data using Hive queries.
  • Developed Simple to Complex MapReduce jobs using Hive and Pig.
  • Developed the MapReduce pipeline jobs to process the data and create necessary HFiles.
  • Developed multiple MapReduce jobs for data cleaning and processing.
  • Involved in loading data from Linux file system to HDFS.
  • Gathered data using Star and Snowflake schemas in initial days of project.
  • Extracted files form CouchDB through Sqoop placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
  • Loaded and transformed large sets of structured, semi-structured and unstarched data.
  • Responsible for managing data from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained technical documentation for Hadoop clusters and for execution of Hive queries and Pig scripts.
  • Reviewed and managed Hadoop log files to detect the failure.
  • Tableau server used to provide BA’s with end reports based on their requirements.
  • Involved in managing users and their dash boards in Tableau.

Environment: Hadoop(Pig, Hive, MapReduce, Pig, Sqoop, HBase), Linux and Tableau.

Confidential

Tableau Developer

Responsibilities:

  • Tested, cleaned and standardized data to meet the business standards using SQL statements.
  • Fine-tuned SQL queries for maximum efficiency and performance.
  • Developed, customized, maintained, modified and managed tables, graphs, documents and slides for the effective creation of data.
  • Build and published customized interactive reports and dashboards using Tableau server.
  • Build effective connectivity between traditional RDMS to Tableau server.
  • Created batch stored procedure for the report scheduler per the monthly, weekly and daily.
  • Created and developed parameterized, Drilldown and Ad Hoc reports using SQL server.
  • Created an OLTP connection between MySQL and Tableau server.
  • Designed data models from scratch to aid team in data collection to be used in Tableau visuals.
  • Gained working knowledge in Data modeling, creating Star and Snowflake schemas.
  • Troubleshoot test scripts, SQL queries, ETL jobs, data warehouse/data mart/data store models.
  • Used Pentaho Report designer to create various reports having drill down functionality by creating Groups in the reports.
  • Created Trend line, Log axis and Statistics. Groups, hierarchies, stets to create detailed level summery reports and dashboards.
  • Involved in creating database objects tables, views, functions using SQL to provide definition, structure and maintain data effectively.

Environment: Tableau, MySQL, Oracle, SQL/PL SQL, Pentaho.

Confidential

Java Developer

Responsibilities:

  • Interacting with the client on regular basis to gather requirements.
  • Involved in Design and development of Data Marts for specific Business aspects which include Policy, Claim, Enterprise Billing and eQuote.
  • Understanding the business, technical, and fundamental requirements.
  • Used Bash and text-processing techniques (Grep, SED, AWK) to modify makefiles.
  • Consulted with implementers of the Internet LEC access product supplying Perl programming and system design expertise.
  • Implemented Hibernate/Spring Framework for database and business layer.
  • Automated test suite for validating the REST service for various services request and responses (JSON) involved in the application and comparing the results in Portal.
  • Involved in configuring Oracle stored procedure for data business logic.
  • Created PL/SQL stored procedure for contract generation module.
  • Involved in configuring and deploying code to different environments integration, QA and UAT.
  • Worked with Data Modeler and DBAs to build the data model and table structures.
  • Involved in developing learning manuals on key process and procedure regarding JIRA and Confluence.
  • Preparing and designing system acceptance test cases and executing them.
  • Created and build script to build Artifacts.
  • Worked on fine tuning the response time of Web service components.

Environment: Java, EJB, Servlets, Struts, Spring, REST, Hibernate, Perl, Tomcat, JIRA Web Logic, Oracle 10g.

We'd love your feedback!