Hadoop Developer/ Data Engineer Resume
Detroit, MichigaN
PROFESSIONAL SUMMARY:
- Around 7+ years of overall experience as software developer in Design, Development, Deploying, Administrating and supporting large scale distributed systems.
- Having 4+years of experience in using various Hadoop Ecosystem components such as Mapreduce, Pig, Hive, Sqoop, Oozie, Flume, Kafka, Solr and Spark for data storage and analysis.
- Highly experienced in importing and exporting data between HDFS and Relational Systems like Mysql and Teradata using Sqoop.
- Knowledge on big - data database Hbase and NoSQL databases MongoDB and Cassandra.
- Experience in working with Map Reduce Programs, Pig Scripts and Hive commands to deliver the best results.
- Expertise in writing Apache Spark Streaming API on Big Data distribution in the active cluster environment .
- Developed Spark RDD and Spark DataFrame API for Distributed Data Processing.
- Extensive knowledge on Amazon Web Services ( AWS) EC2, S3, Elastic Map Reduce ( EMR ) and also on Snowflake, Redshift, Identity and Access Management ( IAM ).
- Expertise in Java/J2EE Technologies such as Core Java, Spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS And Javascript.
- Good middleware skills in J2EE, web services with application servers - Tomcat WebServer, BEA Weblogic,IBMWebsphere,Jboss with experience on heterogeneous operating systems
- Proficient in SQL, PL/SQL programming skills like Triggers, Stored Procedures, Functions, Packages etc in developingapplications.
- ExperienceinwritingDown-StreamandUp-StreamPipelinesusingPython.
- Good experience in creating back-end table cubes to be used by ReportingTools.
- Good exposure of automation of ETL process using Python and Shell script.
- Hands on experience on AWS cloud services ( VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, WorkSpaces, RDS, SNS, SQS )
- Good experience of AWS Elastic Block Storage ( EBS ), different volume types and use of various types of EBS volumes based on requirement.
- Worked and learned a great deal from AmazonWebServices( AWS ) Cloud services like EC2, S3, EBS, RDS and VPC.
- Expertise in understanding of relational database concepts and development with multiple RDBMS databases including Oracle,Mysql,And MSSQLServerSQLDialects such as PL/SQL.
- Writing Unix Shell Scripts to run batch jobs,automated processes,and to automate routine systems
- Experience of using integrated development environment like Eclipse, Net Beans, Jdeveloper, Myeclipse.
- Strongly managed all the Functional, Integration, Automation, Smoke Testing And Regression Testing on maintenance and development projects.
- Created various views in Tableau like Tree Maps, Heat Maps, Scatter Plots, Geographic Maps, Line Chart, Pie Charts andetc.
- Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
- Responsible for batch processing and real time processing in HDFS and NoSQL Databases.
- Experience in Batch processing of large volume of data for upstream purpose.
- Good knowledge of Excel to aggregate data for visible result.
TECHNICAL SKILLS:
Business Tools: Tableau 8.X/9.X,Crystal Reports Teradata 13.x, Teradata SQL AssistantBig Data: Hadoop,MapReduce,Hive,Pig,HBase,Flume,Sqoop,Solr,Spark,SparkSQL,Impala, HCatalog, Oozie, Zookeeper, Kafka, Storm, Hue, Cassandra, Tez, Mahout.
Languages: Java, Python, SQL, HTML, JavaScript, JDBC, XML, and C/C++.
Databases: DB2, MySQL, MS SQL server, Vertica, Mongo DB, Oracle, SQL 2008
Web Development: HTML, HTML5 Java Script, XML, PHP, JSP, Servlets, JavaScript, JSON.
Application Server: Apache Tomcat, WebLogic, WebSphere, JBoss.
Tools: Eclipse, Net Beans, Putty, WinScp.
Operating System: Mac OS, Unix, Linux (Various Flavours), Windows 2003/7/8/10/XP
PROFESSIONAL EXPERIENCE:
Confidential, Detroit, Michigan
Hadoop Developer/ Data Engineer
Responsibilities:
- Analyzing the requirements and the existing environment to help come up with the right strategy to Migrate the Salesforce to internal FB CRM system .
- Developed the data ingestion pipelines to ingest the data in to the Hive from the scrapes(Salesforce) with the help of Phoenix Framework.
- Developed job flows in Dataswarm to automate the workflow for extraction of data from warehouses.
- Designed and implemented Presto application to evaluate quality of recommendations made by the engine.
- Worked on Mapping the source table with the target table.
- Used Data Lineage tool to find the usage of the columns of source tables in the downstream so, that no downstream get impacted with this migration.
- Worked on Core tables of Revenue Data Feed(RDF) that calculates the revenue of the advertisers of the Facebook.
- Extensively worked on Data validation between Hive source tables and target tables using automation Python Scripts.
- Worked on converting the Data Ingestion pipelines from the Hive to Presto to improve the performance of the Pipeline Created by the Data Analyst for the Business.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.
- Experienced in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic.
- Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Migrated complex Map Reduce programs into Spark RDD transformations,actions.
- Designed and published visually rich and intuitive Tableau dashboards and Crystal Reports for executive decision making.
- Monitored System health and logs and responded accordingly to any warning or failure conditions.
Environment:: Windows & UNIX OS, Amazon web services, AWS-S3,Redshift, Apache-Hadoop, Hive, Pig, Shell Script, Java, Business Objects Reporting tool, CA-Rally, Agile Methodology.
Confidential,Salem,NewHampshire
Senior Hadoop Developer
Responsibilities:
- Gathering the requirements for building the BIC(Born In Cloud) System.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Using Flume to handle streaming data and loaded the data into Hadoop cluster.
- Developed script to load the data into Redshift from Hive tables.
- Automation of all the jobs starting from pulling the Data from different Data Sources like Pelican, Revpro and pushing the result dataset to Hadoop Distributed File System and running MR,PIG,and Hive jobs using Zookeeper and Oozie(Work Flow management) execute the job on daily basis.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and utilizing hive SerDes like REGEX,JSON and AVRO.
- Worked on ETL scripts to pull the data from denodo Data Base into HDFS.
- Used Spark Dataframes, Spark-SQL, Spark MLLib extensively.
- Developed Pig Latin scripts and extending Hive and Pig core functionality by writing custom UDFs.
- Used AWS services like EC2 and S3 for small data sets.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- Data validation between Hive target tables and Redshift source tables using automation scripts.
Environment:: Windows & UNIX OS, Amazon web services, AWS-S3,Redshift, Apache-Hadoop, Hive, Pig, Shell Script, Java, Business Objects Reporting tool, CA-Rally, Agile Methodology.
Confidential, Denver,Colorado
Hadoop Developer
Responsibilities:
- Involved in loading data from UNIX file system to HDFS.
- Provided quick response to adhoc internal and external client requests for data and experienced in creating ad hoc reports.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Good knowledge of MAPR and Horton Works Distributions for Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting,manage and review databackups,manage and review Hadoop logfiles.
- InvolvedinworkingonCassandradatabasetoanalyzehowthedatagetstored.
- Developed Map Reduce Programs using MRv1 andMRv2.
- Designed and developed PIG latin Scripts to process data in a batch to perform end analysis.
- Used Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Analyzed the data by performing Hive queries and running Pig scripts to know customer behaviour.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
- Good understanding of performance tuning with NoSQL, Kafka, Storm and SQLTechnologies.
- Design/Develop framework to leverage platform capabilities using MapReduce, Hive UDFs .
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment:: MapR,Pig,Mapreduce,Hive,NoSQL,kafka,Storm,Zookeeper,Oozie,Cassandra,Unix,hadoop distributions, HDFS, Impala, Java, Sqoop, Tableau, Oozie Workflows, Shell Scripts, Junit, Python, AWS.
Confidential, SanFrancisco,California
Hadoop Developer
Responsibilities:
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Imported data using Sqoop to load data from Mysql to HDFS on regular basis.
- Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
- Developing UDFs in java for Hive and Pig.
- Scheduling and Managing jobs on a Hadoop cluster using Oozie workflow.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Experience in developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML,JSON,CSV and other file formats.
- Worked on both External and Managed HIVE tables for optimized performance.
- Hands-on experience in using Hive partitioning, bucketing and execute different types of joins on Hive tables and implementing Hive SerDe’s like JSON and Avro.
- Involved in creating Hive tables, Pig tables, and loading data and writing Hive queries and Pig scripts.
- Storing,processing and analyzing huge data-set forgetting valuable insights from them.
- Hands on experience on extract, manipulate and built complex formulas in Tableau for various business calculations.
- Deployed Tableau Dashboards and worksheets in clustered environment
Environment:: Informatica8.X/9.X, Oracle 10g, Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata, Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.
ConfidentialJava/J2EE Developer
Responsibilities:
- Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application.
- Preparation of the Detailed Design document for the project by developing business process flows, requirements definition, use cases, and object model.
- Used MVC architecture in the project development.
- Worked on core java for multithreading, arrays and GUI(AWT).
- Experience in markup languages like HTML,DHTML,XML and Cascadingstylesheets(CSS).
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
- Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
- Used CSS style sheets for presenting data from XML documents and data from databases to render on HTML web pages. Developed the client classes for the Web Service implementing SOAP.
- Involved in development of a generic Data access object (DAO) layer module for user accounts and sales reporting using JDBC to interface with database systems running on Oracle.
- Designed and implemented a GUI framework for Swing. Developers using the framework define actions,popup menus in XML,the framework builds the graphical components.
- Developed web application using JSP Framework.
- Configured Spring and EJB to manage javabeans and set their dependencies in a context file.
- Experience on various JavaScript frameworks i.e jQuery,AJAX,JSON and AngularJS.
- Published and consumed Web Services using SOAP, WSDL and deployed it on WebLogic server WebServer.
- Strong Knowledge of SQL and PL/SQL and good in writing stored procedures and triggers in Oracle 8i/9i/10g.
- Worked extensively with AJAX for bringing data from backend without reloading the page.
- Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
Environment:: Java, J2EE, EJB, JNDI, JMS, JDBC, Servlets, JSP, XML, SAX, Design Patterns, MVC, Struts, CSS, HTML, DHTML, JavaScript 1.2, UML, Junit, SOAP, WSDL, web services OAS, Javadoc, VSS, Solaris 8, C++, My SQL3.2.