We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY:

  • 7+ years of IT experience, including 3 years work experience as a Hadoop consultant in big - data conversion projects gathering and analyzing customer’s technical requirements and .
  • Working experience on Cloudera, Horton Works Hadoop distribution.
  • Good Domain knowledge on Insurance, Banking and E-commerce.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, yarn, hive, SQOOP, HBase, Flume, Oozie, Name Node, Data Node, and Map Reduce concepts.
  • Experience in working with Map Reduce programs using Apache Hadoop to analyze large data sets efficiently.
  • Hands on experience in working with Ecosystems like Hive, Pig, SQOOP, Map Reduce, Flume, Oozie. Strong knowledge of Pig and Hive's analytical functions, and writing custom UDFs.
  • Experience in importing and exporting/importing data using SQOOP from HDFS to Relational Database Systems and vice-versa.
  • Good knowledge in Spark and spark components like Spark, SparkSQL.
  • Experienced in developing simple to complex Map/Reduce jobs, Hive and Pig to handle files in multiple formats (JSON, Text, XML, Avro, Sequence File and etc.)
  • Expertise in J2EE Frameworks, Servlets, JSP, JDBC, XML. Familiar with System Programming by using C, C++.
  • Extensive in-depth knowledge in OOAD concepts, Multithreading, Activity Diagrams, Sequence Diagrams and Class Diagrams using UML.
  • Experience using Design Patterns (Singleton, Factory, Builder) including MVC architecture.
  • Have very good exposure to the entire Software Development Life Cycle.
  • Excellent organizational and interpersonal skills with a strong technical background.
  • Quick Learner and ability to work in challenging and versatile environments and Self-motivated, excellent written/verbal communication skills.
  • Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.

TECHNICAL SKILLS:

Languages/Scripting: Java, Python, Pig Latin, Scala, HiveQL, SQL LINUX shell scripts, Java Script.

Big Data Framework/Stack: Hadoop HDFS, MapReduce, YARN, Hive, Hue, Impala, SQOOP, Pig, HBase, Spark, Kafka, Flume, Oozie, Zookeeper, KNIME etc

Hadoop Distributions: Apache Cloudera CDH5, Hortonworks HDP2.X

RDBMS: Oracle, DB2, SQL Server, MySQL

No SQL Databases: HBase, MongoDB

Software Methodologies: SDLC- Waterfall / Agile, Scrum

Operating Systems: Windows XP/NT/7/8, REDHAT, Centos, Mac

IDE s: Net beans, Eclipse

File Formats: XML, Text, Sequence, JSON, ORC, AVRO, and Parquet.

PROFESSIONAL EXPERIENCE:

Confidential - New York, NY

Hadoop/Spark Developer

Responsibilities:

  • Used Cloudera distribution for hadoop ecosystem.
  • Converted MapReduce jobs into Spark transformations and actions using Spark RDDs in python.
  • Written Spark jobs in python to analyze the data of the customers and sales history.
  • Used Kafka to get data from many sources into HDFS.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase tables.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Written Python applications to interact with the MySQL database using spark SQL Context and also accessed Hive tables using Hive Context.
  • Created hive external tables to perform ETL on data that is generated on daily basics.
  • Created HBase tables for random lookups as per requirement of business logic.
  • Performed transformations using spark and loaded data into HBase tables.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS.
  • Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
  • Collected log data from web servers and pushed to HDFS using Flume.

Environment: s: Hadoop, Hive, Flume, REDHAT6.x, Shell Scripting, Java, Eclipse, HBase, Kafka, SparkPython, Oozie, Zookeeper, CDH5.x, HQL/SQL, Oracle 11g.

Confidential, Rosemont, IL

Hadoop Developer

Responsibilities:

  • Work on the POC for Apache Hadoop framework initiation.
  • Work on Installed and configured Hadoop 0.22.0 MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
  • Implement Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Responsible to manage data coming from different sources
  • Monitor the running MapReduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Install and configure Hive and also written Hive UDFs.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implement the workflows using Apache Oozie framework to automate tasks.
  • Develop scripts and automated data management from end to end and sync up b/w all the clusters.
  • Manage IT and business stakeholders, conduct assessment interviews, solution review sessions
  • Review the code developed and suggest any issues w.r.t customer data.
  • Use SQL queries and other tools to perform data analysis and profiling.
  • Mentor and train the engineering team in use of Hadoop platform and analytical software, development technologies

Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Centos, Sqoop, Hive, Oozie.

Hadoop Developer

Confidential - Oakland, California.

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Importing and exporting data into HDFS using SQOOP and Kafka.
  • Created Hive tables and working on them using Hive QL
  • Created partitioned tables in Hive for best performance and faster querying.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on Hive UDF’s using data from HDFS.
  • Performed extensive data analysis using Hive.
  • Executed different types of joins on Hive tables.
  • Used Impala for faster querying purposes.
  • Created indexes and tuned the SQL queries in Hive.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Develop HiveQL scripts to perform the incremental loads.
  • Worked on different Big Data file formats like text, sequence, avro, parquet and snappy compression.
  • Involved in identifying possible ways to improve the efficiency of the system.
  • Involved in generating the data cubes for visualizing.

Environment: Hadoop, Hive, Pig, SQOOP, Kafka, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager, Map Reduce.

Hadoop Developer

Confidential - Tampa, Florida.

Responsibilities:

  • Responsible to manage data coming from different sources
  • Involved in loading and transforming large sets of structured, semi structured and unstructured.
  • Developed Pig UDFs in Python for preprocessing the data.
  • Extensively Worked on Flat files.
  • Performed Joins, Grouping, and Count Operations on the Tables using Impala.
  • Developed pig Latin scripts for validating different query modes.
  • Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Created SQOOP jobs to export analyzed data to relational database.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map .
  • Implemented bucketing, partitioning and other query performance tuning techniques.
  • Generated various reports using Tableau with Hadoop as a source for data.

Environment: Hadoop, Map Reduce, Hive, Pig, Tableau, Python, SQOOP, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager.

Confidential, NYC, NY

Java Developer

Responsibilities:

  • Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance
  • Worked in Agile Scrum methodology
  • Involved in writing exception and validation classes using core java
  • Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX
  • Developed framework using Java, MySQL and web server technologies
  • Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).
  • Validated the XML documents with XSD validation and transformed to XHTML using XSLT
  • Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-ORM
  • Spring beans were used for controlling the flow between UI and Hibernate
  • Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons
  • Worked on database interaction layer for insertions, updating and retrieval operations of data from data base by using queries and writing stored procedures
  • Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web
  • Used Eclipse IDE for development and JBoss Application Server for deploying the web application
  • Used Apache Camel for creating routes using Web Service
  • Used JReport for the generation of reports of the application
  • Used Web Logic as application server and Log4j for application logging and debugging
  • Used CVS version controlling tool and project build tool using ANT

Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP, JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss, Log4J, JUnit, ANT, CVS

Confidential

Java Developer

Responsibilities:

  • Involved in designing and developing enhancements per business requirements with respect to front end JSP development using Struts.
  • Implemented the project using JSP and Servlets based tag libraries.
  • Conducted client side validations using JavaScript.
  • Coded JDBC calls in the Servlets to access the Oracle database tables.
  • Generate SQL Scripts to update the parsed message into Database.
  • Worked on parsing the RSS Feeds (XML) files using SAX parsers.
  • Designed and coded the java class that will handle errors and will log the errors in a file.
  • Developed Graphical User Interfaces using struts, tiles and JavaScript. Used JSP, JavaScript and JDBC to create Web Servlets.
  • Utilized the mail merge techniques in MS Word for time reduction in sending certificates.
  • Involved in documentation, review, analysis and fixed postproduction issues.
  • Worked on bug fixing and enhancements on change requests.
  • Designed the various animations with different graphics using with Macromedia Flash MX with Action Script 1.0, Photo Impact and GIF Animator.
  • Understanding the customer requirements, mapping them to functional requirements and creating Requirement Specifications.
  • Developed web pages to display the account transactions and Application UI creation using GWT, Java, JSP, CSS and web standards improving application usability always meeting tight deadlines
  • Responsible for the configuration of Struts web based application using struts-config.xml and web.xml
  • Modified Struts configuration files as per application requirements and developed Web services for non-java clients to obtain user information details pertaining to that account using JSP, DHTML, Spring Web Flow and CSS.

Environment: HTML/CSS/JavaScript/JSON, JDK 1.3, J2EE, Servlets, Java Beans, MDB, JDBC, MS SQL Server, JBoss, I frameworks & libraries Struts, Spring MVC, JQuery, MVC concepts, XML, SVN.

We'd love your feedback!