Hadoop Developer Resume
Burlington, NC
SUMMARY
- Passionate Developer wif good experience in developing, configuring, implementing Hadoop and Big - data ecosystems on various platforms as well as development and documentation of various applications.
- Hands on expertise in ETL tools for data integration on big data and also importing and exporting teh data from Relational Data Base Systems (RDBMS) to HDFS using SQOOP.
- Good knowledge on setting up job streaming and scheduling wif Oozie, and working on messaging system such as Kafka integrated wif Zookeeper.
- Sound knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka.
- Experienced in developing simple to complex Map/Reduce jobs using Hadoop technologies to handle files in multiple formats (JSON, Parquet, XML, Avro, 834 files etc.)
- Hands on experience on verifying teh cleansed data on Talend and also worked on Talend Administrative Console for adding users and scheduling jobs.
- Ability to work independently or in a group wif dedication culminated wif strong communication skills halps to meet teh dead lines.
TECHNICAL SKILLS
Big Data Eco systems: HDFS, MapReduce, YARN, ZooKeeper, Hive, Pig, Oozie, Sqoop, Kafka, Storm, Apache Spark, Apache Tez, Splunk, Impala, Nifi, Flume, Apache Solr, Kylin, Apache Zeppelin
DataBases: MYSQL, SQL Server, DB2, Teradata, Netezza, Postgres
NoSQL: Cassandra, HBase, MongoDB
Business Intelligence Tools: Tableau, Zoom Data, Power BI
Cloud Technologies: AmazonWebServices(AWS), CDH5,HDP-2.5,Horton works Pivotal
PROFESSIONAL EXPERIENCE
Confidential, Burlington, NC
Hadoop Developer
Responsibilities
- Played a keyrole in gathering teh requirements and architectural design of teh data flow.
- Played a major role in building a Real Time data ingestion pipeline that gets teh data from teh IBM-MQ to hadoop.
- Worked on setting up teh FLUME as a real time data ingestion tool into a partitioned Hive tables.
- Responsible for parsing teh unstructured (XML data) into structured format wif applying Business parsing rules, exclusion rules as required for teh downstream applications.
- Worked on building Data-pipe line using Talend-Real Time as an ETL tool, using spark as execution engine.
- Worked on building transformation logic on teh datasets involving in standardizing teh data in Hive partitioned tables.
- Worked on implementation of teh data loads from Hive tables to SAP HANA.
- Developed Real-time data ingestion using Spark Streaming to Hadoop from IBM-MQ.
- Worked on setting up FLUME, optimizing and fine tuning teh various parameters depending on teh Requirements.
- Worked on implementation of loading and storing teh PDF files on Amazon S3 from Hadoop as per teh requirement.
- Worked on cleaning and parsing teh different file formats such as Parquet, Avro, HL7, XML files.
Environment Cloudera Distribution CDH-5.12, Hive, Impala, Apache Kylin, Amazon S3, Talend-Real Time, Spark-SQL, Apache Flume, Tableau, IBM-MQ, GitHub.
Confidential, LongBeach, CA
Hadoop Developer.
Responsibilities
- Involved in requirement analysis, architecture of teh EDP (Enterprise Data PlatForm), Designing and coding.
- Played a major role in building a Near-Real Time data pipeline (CDC) that brings teh data from teh My-SQL server which is a transactional system to hadoop using Kafka as messaging system.
- Worked on logging framework around teh data pipeline to Hadoop for future analysis.
- Responsible for data integrity checks, data validation checks and data certification checks in order to provide teh quality data for teh downstream applications.
- Worked on building Data-pipe line using Talend-Big data as an ETL tool, which involves custom Java code that validates teh data in teh pipe line.
- Worked on implementing Spark-SQL in building datasets involving complex joins of teh Hive partitioned tables.
- Involved in building validation framework around teh data that is present in Enterprise Data Platform.
- Worked on CVD (Cardio Vascular Disease) predictive analysis using QRISK2 model.
- Developed efficient bulk loading techniques wif teh huge sets of data to teh Hbase & Phoenix tables more efficiently achieving industrial range bench mark.
- Worked on setting up HDP cluster, optimizing and fine tuning teh memory requirements of teh various environments depending on teh Enterprise Data Platform Requirements.
- Worked on bringing unstructured data (Image file and Email data) to HDFS and automate teh process in bringing teh business value by making teh claims process in much efficient way TEMPthan teh traditional systems.
- Worked on cleaning and parsing teh different file formats such as Parquet, Avro, Csv and XLSb files.
- Worked on publishing, deploying teh code to Nexus and scheduling teh jobs through Autosys.
Environment HDP-2.4, Cloudera Distribution CDH-5.12, Kafka, Hive, Tez, Hbase, Apache Phoenix, Impala, Apache Kylin, Talend-Big Data, Spark-SQL, Apache Zeppelin, Zoom Data,Kafka, Apache-Pig, NiFi, Squirell, Gitblit.
Confidential . Sunnyvale, CA
Hadoop Developer.
Responsibilities
- Played a key role in discussing about teh requirements, analysis of teh entire system along wif estimation, development and testing accordingly keeping BI requirements as a note.
- Involved in developing and installations of Sqoop, Hive and FTP integrations to down systems.
- Performed importing data from various sources to teh Cassandra cluster using Sqoop.
- Did a Sqoop job to halp teh import of data which is in different formats such as Avro, XML obtained from different vendors.
- Developed UNIX shell scripts for teh business process and assimilation of data from different interfaces.
- Developed Sqoop scripts for writing teh processed data into HBase tables which halps BI team for teh data visualization.
- Established an Oozie job to implement a job scheduler which should occur on a daily basis.
- Worked on debugging and performance tuning on Map Reduce, HIVE and Sqoop jobs.
- Involved in diagnosing different possible ways to optimize and improve teh efficiency of teh Hadoop cluster.
- Developed multiple POC’s using Scala which are deployed on teh cluster in turn compared teh performance of Spark wif Map Reduce.
- Involved in creating and maintaining of teh technical documentation for teh Map Reduce, Hive, Sqoop, UNIX jobs along wif Hadoop clusters and also reviewing them to fix teh post production issues.
Environment Red Hat Enterpr ise Linux, HBase, Amazon Web Services, Solr, Kafka, Map Reduce, Hive, Java SDK, Python, Sqoop, Spark, Scala, SBT, Akka, Maven, Cassandra, Github.
Confidential, Pleasanton,CA
Hadoop Developer
Responsibilities
- Involved in setting up teh Hadoop clusters, configuration, monitoring and performance tuning.
- Played a key role in developing and designing data management system using MySQL and worked on CSV file and JSON while retrieving teh data.
- Hands on experience on cloud services like AWS.
- Managed and developed workflow in Oozie which automates teh data loading into HDFS and pre-processing wif Pig.
- Analyzing teh data by performing bucketing and partitioning in Hive and also by writing teh Pig Latin scripts to know teh customer behavioral pattern.
- Mentored in developing Hive queries and also connecting teh Hive tables by ODBC connector to instantiate teh report data to teh front end.
- Implemented Storm integration wif Kafka and ZooKeeper for teh processing of real time data.
- Played a key role in exporting and analyzing teh data from teh relational databases and generating a report for teh BI team.
- Worked on importing some of teh data from NoSQL databases including HBASE, Cassandra.
- Monitoring teh system health, reports and logs in process to act swiftly in teh terms of failure and also alerting teh team regarding teh failures.
Environment Amazon web Services, MapReduce, Cassandra, Apache Solr, Hive, Java, Pig, Sqoop, Azure, XML, Hbase, ASP.Net, Python, Pig-Latin.
Confidential . Englewood, Co
Hadoop Developer
Responsibilities
- Played a key role in installation managing and teh support Linux Operating Systems such as CentOs and Ubuntu.
- Implemented in setting up teh Amazon web services, Clusters, and data buckets on S3 wif Hive scripts to process Big Data.
- Installed and monitored Hadoop Cloudera CDH5 and setting up Hadoop cloudera distribution system.
- Played a key role in implementing single sign on solutions using Kerberos.
- Played an important role in extracting teh data from different data sources and transferring or loading them to Hadoop clusters.
- Played a major role in writing complex Hive queries in order to advance teh data from teh databases to Hadoop distributed files system.
- Involved in creation of automation job to process teh data from different data sources to teh cluster using Oozie.
- Setup and monitor teh development and production environment.
Environment Hadoop, Oracle 11g, Maven, Pig, JUnit, XML, Hive, HTML, Eclipse IDE, Oracle, AWS, JDBC,SQL.
Confidential
Java developer
Responsibilities
- Played an important role in gathering all teh requirements for teh project.
- Developed UI screens using struts tags in JSP.
- Extended standard action classes provided by teh struts framework to handle teh client request.
- Developed web services client interface wif JAX-RPC from WSDL files for invoking teh methods using SOAP.
- Developed RESTful web services for teh downstream systems.
- Oracle 11g was used as teh database for teh application which was hosted in teh cloud using Amazon web services.
- Developed teh application using Eclipse and used it for editing, debugging, and formatting and build automation.
- Used Ajax to provide ease to teh user by providing teh data while filling teh forms in teh application.
- Developed ANT scripts for teh build process and deployed in Web Logic Server.
Environment JDK 1.6, JSP, A jax, JSTL, JavaScript, AWS, CSS, Dreamweaver CS3, Log4j, Junit, Web Logic 10.0, Oracle 11g, SOAP, APACHE CXF 2.5.2., ANT, Struts 1.3.
ConfidentialJava developer
Responsibilities
- Development and implementation of tool.
- Developed JSP custom tags to display data.
- Version maintenance using CVS.
- Implemented UNIX shell scripting.
- Coordinated wif teh testing team to find and fix teh bugs before production.
- Implemented test cases and teh test teh changed application.
Environment Java, J2EE, Servlets, Struts, Junit, EJB, BEA Weblogic, JDBC, SQL, UNIX
