Hadoop Developer Resume
MD
PROFESSIONAL SUMMARY
- 9 years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
- 4 years of experience in working wif Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
- Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
- Excellent Core Java development skills and familiarity wif coding business components using various API's of Java like Multithreading, Collections.
- Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Experienced working wif Spark Streaming, Spark SQL and Kafka for real - time data processing.
- Strong experience troubleshooting Spark applications and various performance considerations to take for efficient memory handling.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Worked in developing a Nifi flow prototype for data ingestion in HDFS.
- Worked on setting up Apache NiFi and performing POC wif NiFi in orchestrating a data pipeline.
- Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.
- Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
- Extensive experience in working wif various distributions of Hadoop Enterprise versions of Cloudera(CDH5), Hortonworks and good knowledge on Amazon's EMR (Elastic MapReduce).
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Developed various Shell Scripts and python scripts to automate Spark jobs and hive scripts.
- Experienced in using Pig scripts to do transformations, event joins filters and some pre-aggregations before storing the data onto HDFS.
- Experience wif Maven, ANT for continuous integration and builds.
- Experience in implementing MVC frameworks like JSF, Spring MVC and ORM tools like Hibernate in J2EE architecture.
- Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
TECHNICAL SKILLSET
Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Nifi Core, MlLib, Hortonworks, Spark SQL and Dataframes
Utilities: Sqoop, Flume, Kafka, Oozie and AutoSys
No SQL Databases: Hbase,Cassandra
Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala
Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL
Tools and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
PROFESSIONAL EXPERIENCE
Confidential, MD
Hadoop Developer
Responsibilities:
- Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
- Exploring wif the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
- Experienced wif batch processing of data sources using Apache Spark and Elastic search.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Experienced to implement Hortonworks distribution system.
- Creating Hive tables and working on them for data analysis to cope up wif the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Migrated the computational code in hql toPySpark.
- Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
- Participated in development/implementation of Cloudera Hadoop environment.
- Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
- Experienced in working wif Elastic MapReduce(EMR).
- Developed Map Reduce programs for some refined queries on big data.
- In-depth understanding of classic MapReduce and YARN architecture.
- Worked wif business team in creating Hive queried for ad hoc access.
- Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Implemented Hive Generic UDF's to implement business logic.
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Installed and configured Pig for ETL jobs.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan exported the transformed data to Cassandra as per the business requirement.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
- Created detailed AWS Security groups which behaved as virtual firewalls dat controlled the traffic allowed reaching one or more AWS EC2 instances.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Performed data integration wif a goal of moving more data effectively, efficiently and wif high performance to assist in business-critical projects using Talend Data Integration.
- Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
- Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
- Built a data flow pipeline using flume, Java (MapReduce) and Pig.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
- Experience in using version control tools like GITHUB to share the code snippet among the team members.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables
- Perform POC on single member debug on Spark and Hive
Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Kafka, Autosys, Oracle, Teradata, Python/PySpark, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cassandra, Nifi, Talend Big Data Integration, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX
Confidential, NJ
Hadoop Developer
Responsibilities:
- Worked on migrating MapReduce programs into Spark transformations using Scala. Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance by adding and removing cluster nodes. Cluster monitoring, troubleshooting, managing and reviewing data backups and log files.
- Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
- Collected and aggregated large amounts of stream data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Analyzed data using Hadoop components Hive and Pig.
- Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work wif sequence files.
- Worked on setting up Apache NiFi and performed POC using NiFi in orchestrating data flows.
- Migrated the computational code in hql toPySpark.
- Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently wif time and data availability.
- Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
- Generated reports using QlikView.
- Worked wif Senior Engineer on configuring Kafka for streaming data
- Developed Spark programs using Scala API’s to compare the performance of Spark wif Hive and SQL
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
- Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming
- Wrote several Hive queries to get valuable information from the hidden large datasets.
- Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
- Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
- Imported data from Teradatadatabase into HDFS and exported the analyzed patterns data back to Teradata using Sqoop.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Worked wif Informatica to perform ETL jobs.
- Performed POCs on Spark test environment
- Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage
Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Kafka, Hbase, Scala, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases
Confidential, Quincy, MA
Big Data Engineer
Responsibilities:
- Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
- Uploaded it to Hive and combined new tables wif existingdatabases
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
- Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
- Loaded various formats of structured and unstructured data from Linux file system to HDFS
- Used Combiners and Partitioners in MapReduce programming
- Written Pig Scripts to ETL the data into NOSQL database for faster analysis
- Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
- Parsing XML data into structured format and loading into HDFS
- Scheduled various ETL process and Hive scripts by developing Oozie workflow
- Utilized Tableau to visualize the analyzed data and performed report design and delivery
- Created POC for Flume implementation
- Involved in reviewing both functional and non-functional aspects of the business model
- Championed to communicate and present the models to business customers and executives, using the same
Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX
Confidential, Albany, NY
Java/J2EE Developer
Responsibilities:
- Understanding the business requirements and preparing the design document.
- Reviewed business requirements and discuss wif application architect about the design.
- Used Value/Transfer Object and Singleton, Data Access Object, Factory design pattern.
- Developed Batch process framework using executive service framework to cascade multiple changes on multiple records in only one transaction.
- Responsible for developing java components using Spring, Spring JDBC, Spring Transaction Management.
- Created and Implemented Microservices or REST APIS using spring boot, REST, JSON.
- Used Spring JDBC in persistence layer dat is capable of handling high volume transactions.
- Implemented the service layer using Spring wif transaction and logging interceptors.
- Used Spring framework for middle tier and Spring-JDBC templates for data access.
- Developed SOAP/REST based Web Services using both SOAP/WSDL and REST.
- Participated in discussion wif business analysts and analyzed the feasibility of the requirements.
- Drew sequence diagrams and Class diagrams using UML.
- Created new tables, Sequences and written SQL queries and PL/SQL in Oracle.
- Developed service layer by using Spring MVC.
- Developed User interface using JSF, JSP, HTML, JavaScript, and CSS, Ajax. Produced and Consumed Soap web services.
- Utilized Agile Methodologies to manage full life-cycle development of the project.
- Implemented MVC design pattern using Spring Framework.
- Used Maven and configured Jenkins to build and deploy the application.
- Form classes of Spring Framework to write the routing logic and to call different services.
- Used Spring DAO to connect wif the database.
Environment: Java JDK 1.7, Oracle 11g, Eclipse, Spring MVC, Web services, Microservices, Agile Methodology, Java/J2EE, SQL, PL/SQL, JSP, IBatis, Apache Tomcat 7, HTML, Java Script, JDBC, XML, XSLT, UML, JUnit, log4j, SVN and Maven.
Confidential
Java Developer
Responsibilities:
- Designing, developing, testing and implementation of scalable online systems in Java, J2EE, JSP, Servlet's and Oracle Database.
- Created UML class and sequence diagrams using Rational Rose.
- Implemented the MVC architecture using Spring Framework.
- Used JavaScript, HTML for creating interactive front-end screens.
- Extensively used Custom JSP tags to separate presentation from application logic.
- Developed JSF custom components and custom tag libraries for implementing the interfaces.
- Developed Servlets, JSP pages, JavaScript and worked on integration.
- Involved in developing presentation layer using JSPs and model layer using EJB Session Beans.
- Co-ordinate wif QA for testing, Production releases, Application deployment, integration and conducting walk-through code reviews.
- Involved in building and parsing XML documents.
- Documented the whole source code developed.
- Involved in writing SQL queries, stored procedure and PL/SQL for back end.
- Used Views and Functions at the Oracle Database end.
- Developed Ant build scripts for compiling and building the application.
- Used Maven as a build tool, wrote the dependencies for the jars dat needs to be migrated.
- Configured and Deployed application on IBM Web Sphere Application Server
- Developed JUnit test cases and performed integration and system testing.
- Coordinated wif other Development teams, System managers and web master and developed good working environment.
- Giving training to freshers in Core Java.
Environment: Java, J2EE, JSP, MVC, Servlets, spring, XML, HTML, JavaScript, JSON, Oracle, MySQL, JUnit, PLSQL, JDBC, ANT script, Maven, IBM Web Sphere
Confidential
JR Software Engineer
Responsibilities:
- Work wif team of developers on python applications for RISK management
- Created SQL queries to pull data from the relational databases
- Gatheird business requirements and converted it into SQL stored procedures for database specific projects
- Developed the DAO layer for the application using Spring Hibernate Template support
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed various EJBs for handling business logic and data manipulations from database.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Designed cascading style sheets and XML part of Order Entry Module & Product Search Module and did client side validations wif java script.
- Developed Tableau visualizations and dashboards using Tableau Desktop
- Designed and developed data management system using MySQL
- Wrote python scripts to parse XML documents and load the data in database
- Expertise in writing Constraints, Indexes, Views, Stored Procedures, Cursors, Triggers and User Defined function
- Created unit test/regression test framework for working/new code
- Interfaced wif third-party vendors to customize UI/UX solutions
- Elegantly implemented page designs in standards-compliant dynamic XHTML and CSS
Environment: Python, Django, MySQL, Linux, HTML, XHTML, SVN, CSS, AJAX, Bugzilla, JavaScript, Apache Web Server
