Hadoop Developer Resume
Eden Prairie, Minnesota
SUMMARY:
- 6+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- 4+ years of hands on experience across Hadoop and that includes extensive experience into Big Data technologies.
- Hands on experience in developing and deploying enterprise - based applications using major Hadoop ecosystem components like Map Reduce , YARN , Hive , Pig , HBase , Flume , Sqoop , Spark Streaming , Spark SQL , Storm , Kafka , Nifi , Oozie , Zookeeper and Cassandra .
- Hands on experience of multiple distributions like Cloudera , Hortonworks and Mapr .
- Experience in installation, configuration, supporting and managing Cloudera Hadoop platform along with CDH4 and CDH5 clusters.
- Used Pig & Python scripting for preprocessing the data.
- Experience in managing Hadoop clusters using Cloudera manager tool.
- Capable of processing large sets of structured , semi-structured and unstructured data and supporting systems application architecture.
- Experience in Automating Sqoop , Hive and Pig scripts using Oozie work flow scheduler.
- Hands on experience in application development using Java, Scala and Linux shell scripting.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS .
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Very good experience in Python and Shell scripting .
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experience in optimizing Hive queries, optimized joins and using different data files with Custom SerDe's.
- Experience in using different file formats like CSV , Sequence , AVRO , RC , ORC , JSON and PARQUET files.
- Experience in big data ingestion tools like Sqoop , Flume and Apache Kafka .
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS .
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala .
- Hands on experience with NoSQL Databases like HBase , MongoDB and Cassandra .
- Experience in retrieving data from databases like MYSQL , Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra .
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Involved in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, and spring, JDBC, JSF, XML, Java Script, HTML, AJAX, SOAP and Amazon Web Services.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Effective leadership quality with good skills in strategy, business development, client management and project management.
PROFESSIONAL EXPERIENCE:
HADOOP DEVELOPER
Confidential - EDEN PRAIRIE, MINNESOTA.
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Responsible for data extraction and data ingestion from different data sources into HDFS by creating ETL pipelines.
- Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs with transformations and actions (map, flatMap, filter, reduce, reduceByKey).
- Faster processing and testing of data is achieved by implementing Spark SQL and Spark using Scala.
- Extend the capabilities of Data Frames using User Defined Functions in Scala.
- Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
- Develop Design and Implement End to End Search Service Solution using Elastic Search .
- Build and produce REST service for custom Search service on Elastic Search .
- Developed index manager for Elastic Search which controls the size of indices even when millions of records are ingested per day.
- Developed Sqoop jobs to process terabytes of .csv format data by using Scala shell commands as per the requirement.
- Developed data pipeline using Sqoop to ingest claims data into HDFS for analysis from AS400 tapes and to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Extracted data from Teradata as .csv files and converted to JSON using Scala spark and ingested to Elastic search .
- Loaded parquet files to spark containers for preprocessing and ingested to Elastic search .
- Continuous monitoring and managing the Elastic search cluster using ELK stack.
- Created indices on Elastic search with different mappings as per requirement.
- Developed elastic search query’s with analyzer’s like fuzziness and phonetic.
- Implemented API like Bulk, Delete by query, alias, reindex on Elastic Search .
- Used Jenkins to setup spark jobs to run on daily basis to load data into Elastic Search .
Environment: Mapr , Hadoop , HDFS , Hive , Pig , Sqoop , SQL , Spark , Spark SQL , Nifi, HBase , Elastic Search , Logstash , Kibana , Scala, Open shift, Jenkins , GitHub .
HADOOP DEVELOPER
Confidential - CARY, NORTH CAROLINA.
Responsibilities :
- Responsible for building scalable distributed data solutions using Hadoop .
- Developed the ETL jobs to load the data into a data warehouse, which is coming from various data sources like Mainframes, flat file.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra .
- Configuring of Hive , PIG , Impala , Sqoop , Flume and Oozie in CDH 5.
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and also importing data from various sources to the Cassandra cluster using Java API's.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Hands on writing Map Reduce code to make semi structured data as structured data and for inserting data into HBase from HDFS .
- Developed Junit tests for testing Map Reduce and also performed testing using small sample data.
- Developed Perl Scripts and Power shell for automation purpose.
- Interacted with business users in gathering the business requirements.
- Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS .
- Transformed and aggregated data for analysis by implementing work flow management of Sqoop , Hive and Pig scripts.
- Write scripts to automate application deployments and configurations monitoring YARN
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
- Wrote Hive queries for data analysis to meet the business requirements.
- Involved in creating Hive tables and loading them with data and writing Hive queries.
- Involved in importing data from Oracle tables to HDFS and HBase tables using Sqoop .
- Real time streaming the data using Spark with Kafka for faster processing.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala .
- Involved in converting Cassandra / Hive / SQL queries into Spark transformations using Spark RDDs in Scala .
- Involved in doing POC’s for performance comparison of Spark SQL with Hive .
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extensively used components like tWaitForFile, tIterate ToFlow, tFlowToIterate, tHashoutput, tHashInput, tMap, tRunjob, tJava, tNormalize and tfile components to create Talend jobs
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.
Environment : Cloudera , Map Reduce , Spark SQL , Spark Streaming , Pig , Hive , Flume , Oozie , Java , Kafka, Nifi, Eclipse, Zookeeper , Cassandra , HBase , Talend , GitHub.
JAVA/HADOOP DEVELOPER:
Confidential - Queens, New York.
Responsibilities:
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the entire web application.
- Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
- Develop Perl packages and scripts to access the databases MS SQL 2008 server, create PDF files and excel reports for clients.
- Troubleshooting existing Perl program.
- Experience on developing XML, XSD, XSL, XSLT, JSON, JAXB components and free marker for XML processing.
- Implemented Java Message Services (JMS) using JMS API.
- Involved in managing and reviewing Hadoop log files.
- Installed and configured Hadoop , YARN , Map Reduce , Flume , HDFS , developed multiple Map Reduce jobs in Java for data cleaning.
- Coded Hadoop Map Reduce jobs for energy generation and PS.
- Coded using Servlets, SOAP Client and Apache CXF Rest API' s for delivering the data from our application to external and internal for communication protocol.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce , Apache Crunch, Hive , Pig and Solr, Splunk .
- Created SOAP Web Service using JAX-WS , to enabled client to consume a SOAP Web Service.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems ( RDBMS ) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
- Designing data models in Cassandra and working with Cassandra Query Language (CQL).
Environment: Mapr, Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.
JAVA DEVELOPER
Confidential
Responsibilities:
- Involved in projects utilizing Java, Java EE web applications to create fully-integrated client management systems.
- Developed UI using HTML , Java Script , JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML .
- Participated in user requirement sessions to analysis and gather Business requirements.
- Development of user visible site using Perl , back end admin sites using Python and big data using core java.
- Involved in development of the application using Spring Web MVC and other components of the Spring framework . Also implemented Dependency Injection using the Spring framework .
- Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
- Implemented Object-relation mapping in the persistence layer using Hibernate (ORM) framework.
- Implemented REST Web Services with Jersey API to deal with customer requests
- Experienced in developing Restful web services: consumed and also produced.
- Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
- Implemented Spring Jdbc for connecting oracle database.
- Designed the application using MVC framework for easy maintainability
- Provided bug fixing and testing for existing web applications.
- Involved in full system life cycle and responsible for Developing, Testing, Implementing.
- Involved in Unit Testing, Integration Testing and System Testing.
Environment: Java, Spring, JSP, Servlets, REST, Oracle, AJAX, Java Script, JQuery, WebLogic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services, UNIX, Maven, Mockito Junits, shell scripting, MVS.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, HBase, Flume, Sqoop, Impala, Oozie, NiFi, Zookeeper, Spark, Ambari, MongoDB, Cassandra, Kafka.
Hadoop Distributions: MapR, Cloudera (CDH3, CDH4, and CDH5), Hortonworks.
No SQL Databases: Elastic Search, Cassandra, HBase.
Java : J2EE, JSP, Servlets, Java Script
Databases : MYSQL, Oracle, DB2 for Mainframes, Teradata.
Operating Systems : Windows, UNIX, Linux
Other Tools : Putty, WINSCP.
Languages: Java, Scala, SQL, HTML, JavaScript, JDBC, XML.
App/Web servers: WebSphere, WebLogic, Tomcat.