Hadoop Developer/spark Developer Resume
SUMMARY
- Over 6 Plus years of extensive hands - on experience in IT industry and including an experience in development using Big Data/Hadoop ecosystem tools, Database and Java/J2EE technologies.
- Experience in new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
- Good experience in processing Unstructured, Semi-structured and Structured data.
- Thorough understanding of the HDFS, Map Reduce framework and extensive experience in developing Map Reduce Jobs.
- Expertise on Hadoop Core Components & Environment Administration, Hive, Pig, Scoop, Oozie, Flume, Hue etc.
- Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Horton works and NoSQL platforms
- Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works and Amazon web services (AWS).
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume, kafka and knowledge of Mapper/Reduce/HDFS Framework.
- Having Experience in Loading Tuple shaped data into Pig and Generate Normal Data into Tuples. Ability to build User-Defined Functionalities(UDFs) not available in core Hadoop.
- Ability to move the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
- Good experienced with Hbase Schema design.
- Experience in Hadoop Distributions like Cloudera, Horton Works, Map R Windows Azure, and Impala.
- Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
- Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address Hbase limitations.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hbase, Hive, Pig, Sqoop, Spark, Storm, Kafka, Oozie, MongoDB, Cassandra
Languages: C, Core Java, Unix, SQL, Python, C#, Scala
J2EE Technologies: Servlets, JSP, JDBC, Java Beans.
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE).
NoSQL Technologies: Cassandra, MongoDB, Hbase
Operating Systems: Windows XP/10, Linux, Sandbox.
Software Package: MS Office 2010.
Tools: & Utilities: Eclipse, Net Beans, My Eclipse, SVN, Git, Maven, SOAP UI, JMX explorer, XML Spy, QC, QTP, Jira
Web Servers: WebLogic, WebSphere, Apache Tomcat.
Web Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer/Spark Developer
Responsibilities:
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Databricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
- Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
- Used incremental imports, delta imports on tables from Teradata having no primary keys and importing them into Hive for the transformations, aggregations.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Developed and implemented hive custom UDFs involving date functions.
- Used Impala for querying the HDFS data.
- Created Partitioning, Bucketing, and Map side Join, Parallel execution for optimizing the hive queries decreased the time of execution from days to hours.
- Implemented Pig scripts and used Skewed, replicated and merge Joins for performance improvements.
- Worked in transforming data from HBase to Hive as bulk operations.
- Developed Oozie workflows and sub-workflows with hundreds Sqoop queries, Map Reduce, Pig Scripts, Hive Queries.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Worked with Autosys scheduler to automate the jobs.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Kafka, Scala, Impala, Hbase, Oracle, Cloudera Distribution, Autosys.
Confidential
Hadoop Developer
Responsibilities:
- Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
- Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
- Implemented Hive UDF's and did performance tuning for better results
- Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
- Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
- Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE.
- Implemented CRUD operations on HBase data using thrift API to get real time insights.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
- Used various compression codecs to effectively compress the data in HDFS.
- Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF’s involving date functions.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
- Created final reports of analyzed data using Apache Hue and Hive Browser and generated graphs for studying by the data analytics team.
Environment: Hadoop, CDH 5.5, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, Spark, Oozie, Linux, UNIX
Confidential
Java Developer
Responsibilities:
- Key responsibilities included requirements gathering, designing and developing the Java application.
- Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code.
- Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server.
- Performed unit testing, system testing and user acceptance test.
- Built Web applications using Struts MVC framework
- Gathered specifications for the Library site from different departments and users of the services.
- Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers.
- Designed and implemented the UI using HTML and Java.
- Worked on database interaction layer for insertions, updating and retrieval operations on data.
Environment: Core Java, JDBC, Struts, HTML, SQL, Oracle10g, Struts, PL/SQL, BM Rational, Eclipse IDE
Confidential
Programmer Analyst/ SQL Developer
Responsibilities:
- Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
- Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
- Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
- Responsible for the designing the advance SQL queries, procedure, cursor, triggers.
- Build data connection to the database using MS SQL Server.
- Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.
Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.