Hadoop/spark Developer Resume
New York, NY
SUMMARY:
- 7+ years of IT experience, including 3 years work experience as a Hadoop consultant in big - data conversion projects gathering and analyzing customer’s technical requirements and .
- Working experience on Cloudera, Horton Works Hadoop distribution.
- Good Domain knowledge on Insurance, Banking and E-commerce.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, yarn, hive, SQOOP, HBase, Flume, Oozie, Name Node, Data Node, and Map Reduce concepts.
- Experience in working with Map Reduce programs using Apache Hadoop to analyze large data sets efficiently.
- Hands on experience in working with Ecosystems like Hive, Pig, SQOOP, Map Reduce, Flume, Oozie. Strong knowledge of Pig and Hive's analytical functions, and writing custom UDFs.
- Experience in importing and exporting/importing data using SQOOP from HDFS to Relational Database Systems and vice-versa.
- Good knowledge in Spark and spark components like Spark, SparkSQL.
- Experienced in developing simple to complex Map/Reduce jobs, Hive and Pig to handle files in multiple formats (JSON, Text, XML, Avro, Sequence File and etc.)
- Expertise in J2EE Frameworks, Servlets, JSP, JDBC, XML. Familiar with System Programming by using C, C++.
- Extensive in-depth knowledge in OOAD concepts, Multithreading, Activity Diagrams, Sequence Diagrams and Class Diagrams using UML.
- Experience using Design Patterns (Singleton, Factory, Builder) including MVC architecture.
- Have very good exposure to the entire Software Development Life Cycle.
- Excellent organizational and interpersonal skills with a strong technical background.
- Quick Learner and ability to work in challenging and versatile environments and Self-motivated, excellent written/verbal communication skills.
- Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.
TECHNICAL SKILLS:
Languages/Scripting: Java, Python, Pig Latin, Scala, HiveQL, SQL LINUX shell scripts, Java Script.
Big Data Framework/Stack: Hadoop HDFS, MapReduce, YARN, Hive, Hue, Impala, SQOOP, Pig, HBase, Spark, Kafka, Flume, Oozie, Zookeeper, KNIME etc
Hadoop Distributions: Apache Cloudera CDH5, Hortonworks HDP2.X
RDBMS: Oracle, DB2, SQL Server, MySQL
No SQL Databases: HBase, MongoDB
Software Methodologies: SDLC- Waterfall / Agile, Scrum
Operating Systems: Windows XP/NT/7/8, REDHAT, Centos, Mac
IDE s: Net beans, Eclipse
File Formats: XML, Text, Sequence, JSON, ORC, AVRO, and Parquet.
PROFESSIONAL EXPERIENCE:
Confidential - New York, NY
Hadoop/Spark Developer
Responsibilities:
- Used Cloudera distribution for hadoop ecosystem.
- Converted MapReduce jobs into Spark transformations and actions using Spark RDDs in python.
- Written Spark jobs in python to analyze the data of the customers and sales history.
- Used Kafka to get data from many sources into HDFS.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase tables.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Written Python applications to interact with the MySQL database using spark SQL Context and also accessed Hive tables using Hive Context.
- Created hive external tables to perform ETL on data that is generated on daily basics.
- Created HBase tables for random lookups as per requirement of business logic.
- Performed transformations using spark and loaded data into HBase tables.
- Performed validation on the data ingested to filter and cleanse the data in Hive.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS.
- Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
- Collected log data from web servers and pushed to HDFS using Flume.
Environment: s: Hadoop, Hive, Flume, REDHAT6.x, Shell Scripting, Java, Eclipse, HBase, Kafka, SparkPython, Oozie, Zookeeper, CDH5.x, HQL/SQL, Oracle 11g.
Confidential, Rosemont, IL
Hadoop Developer
Responsibilities:
- Work on the POC for Apache Hadoop framework initiation.
- Work on Installed and configured Hadoop 0.22.0 MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and HIVE using Sqoop.
- Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Implement Partitioning, Dynamic Partitions, Buckets in HIVE.
- Responsible to manage data coming from different sources
- Monitor the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS.
- Install and configure Hive and also written Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Implement the workflows using Apache Oozie framework to automate tasks.
- Develop scripts and automated data management from end to end and sync up b/w all the clusters.
- Manage IT and business stakeholders, conduct assessment interviews, solution review sessions
- Review the code developed and suggest any issues w.r.t customer data.
- Use SQL queries and other tools to perform data analysis and profiling.
- Mentor and train the engineering team in use of Hadoop platform and analytical software, development technologies
Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Centos, Sqoop, Hive, Oozie.
Hadoop Developer
Confidential - Oakland, California.
Responsibilities:
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Importing and exporting data into HDFS using SQOOP and Kafka.
- Created Hive tables and working on them using Hive QL
- Created partitioned tables in Hive for best performance and faster querying.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on Hive UDF’s using data from HDFS.
- Performed extensive data analysis using Hive.
- Executed different types of joins on Hive tables.
- Used Impala for faster querying purposes.
- Created indexes and tuned the SQL queries in Hive.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Develop HiveQL scripts to perform the incremental loads.
- Worked on different Big Data file formats like text, sequence, avro, parquet and snappy compression.
- Involved in identifying possible ways to improve the efficiency of the system.
- Involved in generating the data cubes for visualizing.
Environment: Hadoop, Hive, Pig, SQOOP, Kafka, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager, Map Reduce.
Hadoop Developer
Confidential - Tampa, Florida.
Responsibilities:
- Responsible to manage data coming from different sources
- Involved in loading and transforming large sets of structured, semi structured and unstructured.
- Developed Pig UDFs in Python for preprocessing the data.
- Extensively Worked on Flat files.
- Performed Joins, Grouping, and Count Operations on the Tables using Impala.
- Developed pig Latin scripts for validating different query modes.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Created SQOOP jobs to export analyzed data to relational database.
- Created Hive tables, loaded data and wrote Hive queries that run within the map .
- Implemented bucketing, partitioning and other query performance tuning techniques.
- Generated various reports using Tableau with Hadoop as a source for data.
Environment: Hadoop, Map Reduce, Hive, Pig, Tableau, Python, SQOOP, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager.
Confidential, NYC, NY
Java Developer
Responsibilities:
- Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance
- Worked in Agile Scrum methodology
- Involved in writing exception and validation classes using core java
- Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX
- Developed framework using Java, MySQL and web server technologies
- Developed and performed unit testing using JUnit framework in a Test-Driven environment (TDD).
- Validated the XML documents with XSD validation and transformed to XHTML using XSLT
- Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-ORM
- Spring beans were used for controlling the flow between UI and Hibernate
- Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons
- Worked on database interaction layer for insertions, updating and retrieval operations of data from data base by using queries and writing stored procedures
- Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web
- Used Eclipse IDE for development and JBoss Application Server for deploying the web application
- Used Apache Camel for creating routes using Web Service
- Used JReport for the generation of reports of the application
- Used Web Logic as application server and Log4j for application logging and debugging
- Used CVS version controlling tool and project build tool using ANT
Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP, JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss, Log4J, JUnit, ANT, CVS
Confidential
Java Developer
Responsibilities:
- Involved in designing and developing enhancements per business requirements with respect to front end JSP development using Struts.
- Implemented the project using JSP and Servlets based tag libraries.
- Conducted client side validations using JavaScript.
- Coded JDBC calls in the Servlets to access the Oracle database tables.
- Generate SQL Scripts to update the parsed message into Database.
- Worked on parsing the RSS Feeds (XML) files using SAX parsers.
- Designed and coded the java class that will handle errors and will log the errors in a file.
- Developed Graphical User Interfaces using struts, tiles and JavaScript. Used JSP, JavaScript and JDBC to create Web Servlets.
- Utilized the mail merge techniques in MS Word for time reduction in sending certificates.
- Involved in documentation, review, analysis and fixed postproduction issues.
- Worked on bug fixing and enhancements on change requests.
- Designed the various animations with different graphics using with Macromedia Flash MX with Action Script 1.0, Photo Impact and GIF Animator.
- Understanding the customer requirements, mapping them to functional requirements and creating Requirement Specifications.
- Developed web pages to display the account transactions and Application UI creation using GWT, Java, JSP, CSS and web standards improving application usability always meeting tight deadlines
- Responsible for the configuration of Struts web based application using struts-config.xml and web.xml
- Modified Struts configuration files as per application requirements and developed Web services for non-java clients to obtain user information details pertaining to that account using JSP, DHTML, Spring Web Flow and CSS.
Environment: HTML/CSS/JavaScript/JSON, JDK 1.3, J2EE, Servlets, Java Beans, MDB, JDBC, MS SQL Server, JBoss, I frameworks & libraries Struts, Spring MVC, JQuery, MVC concepts, XML, SVN.