We provide IT Staff Augmentation Services!

Hadoop Engineer Lead Resume

5.00/5 (Submit Your Rating)

NY

SUMMARY

  • Over 11+ years of diversified experience in Software Design & Development. Experience as Hadoop developer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.

TECHNICAL SKILLS

Scripting Language: Java, Scala, Python

Hadoop Ecosystem: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, HBase,Impala,oozie,HUE, MongoDB.

ETL Tools: Informatica Power Center 6.1/ 7.1/ 9.1

Operating Systems: MS - DOS, Windows95/98/NT/XP/7, Linux, Unix

Web Technologies: JSP, JDBC, CSS

Databases: Oracle, My SQL

Application /Web Server: Apache Tomcat 4.0, Web Logic, TFS

Functional Testing Tools: Quick Test Pro, Selenium, Load Runner, Quality Center, HPALM, JIRA

PROFESSIONAL EXPERIENCE

Confidential, NY

Hadoop Engineer Lead

Environment: Windows 7/Linux, Hadoop 2.0, Yarn, SharePoint 2014, Hadoop, Amazon S3, Hive, Pig, Map Reduce, Impala, Sqoop, Flume, MongoDB, Zookeeper, Kafka, HBASE, Putty, MySQL, Cloudera, Agile, shell scripting, Java,Scala

Responsibilities:

  • Setting up multi cluster environment running on CDH 5.11.
  • Worked with highlyunstructured and semi structured data of 2048 TBin size.
  • Setting up both prod and dev nodes. 120 total nodes. HDFS Storage has 80 nodes including 12 disk /node having 3.7 TB memory.
  • Involved in data loading from external sources with Impala queries to target tables
  • Involved in setting up MongoDB cluster. Set up user roles, security.
  • Worked on developing JVM applications with Scala and Java.
  • Extensive and good understanding of Hortonworks (HDP 2.6)
  • Created oozie workflow for data ingestion. Workflow runs every week.
  • Involved in sentiment analysis process with spark streaming. Carry out sentiment analysis on Facebook, twitter, Instagram by data ingress through Flume, processing the jobs in SparkJava/scalaand store them in Hbase and MongoDB.
  • Involved in reading writing data from hive through spark SQL & data frames.
  • Setting Monitoring, alerts and events on the CDH with batch Linux/shell scripting
  • Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, kafka.
  • Secure cluster with Kerberos.
  • Used Machine Learning library(Spark Mlib) to train data for sentiment analysis on social media.
  • Worked on processing customer feedback data from press ganey surveys with Spark Scala and storing in Hive tables for further analysis with Tableau
  • Involved in importing data from RDBMS with Sqoop jobs. Troubleshoot if any of the jobs fails to finish.
  • Involved in complete ETL process by doing data loading onto target tables from external vendors data
  • Involve in implementing Github for source versioning, code migration repo.
  • Getting files from Amazon AWS s3 bucket to prod cluster and store them in amazon redshift.
  • Involved in giving access to Cloudera manager, Hue, oozie to internal and external users by creating SR request
  • Granting and revoking users privileges to the both dev and prod clusters
  • Every day running health tests on the services running on Cloudera
  • Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
  • Setting up jobs in Crontab as well as creating oozie workflows, coordination’s.
  • Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
  • Involved in Data analytics and reporting with tableau.
  • Sending daily status of the services running on the cluster to the scrum master.
  • Involved in creating documentation regarding all jobs running on cluster

Confidential, Ridgefield, NJ

Big Data/Hadoop Engineer Lead

Environment: Windows 7/Linux, SharePoint 2014, hadoop 2.0, Hive, Pig, Map Reduce, Sqoop, Zookeeper, TFS, VS 2015, Putty, MySQL, Cloudera, Agile, Teradata, shell scripting, Java,Scala, sentry

Responsibilities:

  • Worked on a live60 nodes Hadoop clusterrunningCDH5.8
  • Worked with highlyunstructured and semi structured data of 1900 TBin size (270 GB with replication factor of 3)
  • Extracted the data from Teradata/RDMS into HDFS usingSqoop (Version 1.4.6)
  • Created and workedSqoop (version 1.4.6)jobs with incremental loadto populate Hive External tables
  • Involved in setting up Amazon Redshift and import data from RBMS to redshift
  • Worked on spark streaming with kafka.Data ingestion with spark SQL and creating spark data frames.
  • Extensive experience in writingPig (version 0.15)scripts to transform raw datafrom several big data sources into forming baseline big data.
  • DevelopedHive(version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
  • Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Strong Knowledge on Multi Clustered environment and setting up ClouderaHadoopEco-System. Experience in installation, configuration and management ofHadoop Clusters.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Worked on managing big data/ hadoop logs
  • Develop shell scripts for oozie workflow.
  • Worked on BI reporting tool Talend for generating reports.
  • Integrate Talend with Hadoop for processing big data jobs.
  • Good knowledge of Solr, Kafka
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, HBase on Hadoop clusters.
  • Provide detailed reporting of work as required by project status reports

Confidential, Indianapolis, IN

Big Data/ Hadoop Developer

Environment: Windows 7/Linux/Unix, SharePoint 2013, Hadoop 2.0, Eclipse, Hive, Pig, Map Reduce, Sqoop, Hbase, Zookeeper, HPALM, Putty, Oracle/Teradata, Cloudera, Agile

Responsibilities:

  • Design and Implementation of ETL process in Big Data/ Hadoop Eco-systems.
  • Hands on experience on Cloudera and migrating big data from Oracle with Sqoop (Version 1.4.3).
  • Very good experience with both MapReduce 1 (Job Tracker/Task Tracker) and MapReduce 2 (YARN)
  • Worked and import data from Teradata/oracle with SQOOP (version 1.4.3)
  • Implementation of de-duplication process to avoid duplicates in daily load.
  • Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
  • Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4
  • DevelopedUDFsin Java as and when necessary to use in PIG and HIVE queries
  • Experience in usingSequence files, RCFile, AVRO file formats
  • DevelopedOozieworkflow for scheduling and orchestrating the ETL process
  • Process and analyze ETL/ big data jobs in Talend
  • Successfully tested and generate reports in Talend.
  • Successfully integrated Hive tables with MySQL database.
  • Working in UNIX based environment for data operations
  • Experience working on NoSQL databases including Hbase.
  • Experience in deployment of code changes using team city build.
  • Involved in handling code fixes during production release.
  • Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
  • Provide detailed reporting of work as required by project status reports.

Confidential, Folsom, CA

Big Data/ Hadoop Developer

Environment: Windows 7, SharePoint 2013, Hadoop 1.0, Eclipse, Pig, Hive, Flume, Sqoop, HBase, Putty, HPALM, WinSCP, Agile, MySQL

Responsibilities:

  • Responsible for building scalable distributed data solutions usingHadoop and migrate legacy Retail applications ETL toHadoop
  • Accessed information through mobile networks and satellites from the equipment.
  • Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
  • Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
  • Hands on creating different applications in social networking websites and obtaining access data from them.
  • Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
  • Implemented some business logics by writing UDFs in Java and used various UDFs from
  • Piggybanks and other sources to get some results from the data.
  • Worked with cloud administrations like Amazon web services (AWS)
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Hands on exporting the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
  • Worked with application teams to install operating system,Hadoop updates, patches, version upgrades as required.
  • Continuously monitored and managed theHadoopCluster using Cloudera Manager.

Confidential, Warren, MI

ETL/ Informatica/Java Developer

Environment: Windows XP/NT, Informatics 6.2, UNIX,Java, Oracle, SQL, PL/SQL

Responsibilities:

  • Participated in documenting the existing operational systems.
  • Involved in the requirements gathering for the warehouse. Presented the requirements and a design document to the client.
  • Created ETLjobs to load data from staging area into data warehouse.
  • Analyzed the requirements and framed the business logic for the ETLprocess.
  • Involved in the ETL design and its documentation
  • Experience in manufacturing Unit and their activities like Planning, Purchase and Sale activities.
  • Involved in creation of use cases for purchase and sale.
  • Designed and developed complex aggregate, join, lookup transformation rules (business rules) to generate consolidated (fact/summary) data using InformaticaPower center 6.0.
  • Designed and developed mappings using Source qualifier, Aggregator, Joiner, Lookup, Sequence generator, stored procedure, Expression, Filter and Rank transformations
  • Development of pre-session, post-session routines and batch execution routines using InformaticaServer to run Informaticasessions
  • Evaluated the level of granularity
  • Evaluated slowly changing dimension tables and its impact to the overall Data Warehouse including changes to Source-Target mapping, transformation process, database, etc.
  • Collect and link metadata from diverse sources, including relational databases Oracle, XML and flat files.
  • Created, optimized, reviewed, and executed Teradata SQL test queries to validate transformation rules used in source to target mappings/source views, and to verify data in target tables
  • Extensive experience with PL/SQL in designing, developing functions, procedures, triggers and packages.
  • Developed Informaticamappings, re-usable Sessions and Mapplets for data load to data warehouse.
  • Designed and developed Informaticamappings and workflows; Identify and Remove Bottlenecks in order to improve the performance of mappings and workflows and used Debugger to test the mappings and fix the bugs

Confidential

Software Engineer

Environment: Informatica 6.1, PL/SQL, MS Access, Oracle, Windows, Unix, Java 1.8, Restful web services, SOA, Spring, Ajax, JavaScript, CSS 3, JSP, Servlet, JSTL, JPA, Hibernate, Junit, MySQL, Tomcat, JSON

Responsibilities:

  • Extensively worked with the data modelers to implement logical and physical data modeling to create an enterprise level data warehousing.
  • Created and Modified T-SQL stored procedures for data retrieval from MS SQL SERVER database.
  • Automated mappings to run using UNIX shell scripts, which included Pre and Post-session jobs and extracted data from Transaction System into Staging Area.
  • Extensively used Informatica Power Center 6.1/6.2 to extract data from various sources and load in to staging database.
  • Extensively worked with Informatica Tools - Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Repository manager, Workflow Manager, Workflow Monitor, Repository server and Informatica server to load data from flat files, legacy data.
  • Created mappings using the transformations like Source qualifier, Aggregator, Expression, Lookup, Router, Filter, Rank, Sequence Generator, Update Strategy, Joiner and stored procedure transformations.
  • Designed the mappings between sources (external files and databases) to operational staging targets.
  • Involved in data cleansing, mapping transformations and loading activities.
  • Developed Informaticamappings and also tuned them for Optimum performance, Dependencies and Batch Design.
  • Involved in the process design documentation of the Data Warehouse Dimensional Upgrades. Extensively used Informatica for loading the historical data from various tables for different departments.

Confidential

Software Engineer

Environment: ETL, Oracle8i,Windows 95/NT, Oracle, Tomcat, UNIX, XML, Java, Servlets, JSP, Oracle, Windows NT and UNIX, Akka, Tomcat

Responsibilities:

  • Extensively worked on Informaticatools such as Designer (Source Analyzer, Warehouse Designer, Mapping Designer, Transformations), Workflow Manager, and Workflow Monitor
  • Developing mapping using needed transformations in tool according to technical specification
  • Resolving any performance issue with transformations in Informatica and mapping with the help of technical specifications
  • Created and Run the UNIX scripts for all Pre-Post session ETLjobs.
  • Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
  • Developed number of Complex InformaticaMappings and Reusable Transformations
  • Extensively used various transformations like Aggregator, Expression, connected and unconnected Lookup's and update strategy transformations to load data into target.
  • Experience in debugging and performance tuning of targets, sources, mappings and sessions.

We'd love your feedback!