Hadoop Developer Resume
San, AntoniO
PROFESSIONAL SUMMARY:
- More than 6 years of experience in Hadoop/Big Data technologies such as in Hadoop, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Storm, Flink, Flume, Zookeeper, Impala, Tez, Kafka and Spark with hands on experience in writing Map Reduce/YARN and Spark/Scala jobs.
- Deep expertise in Analysis, Design, Development and Testing phases of Enterprise Data Warehousing solutions.
- Expertise in Tableau BI reporting tools & Tableau Dashboards Developments & Server Administration.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Around 1 year of experience in Business Objects Desktop Intelligence, Web Intelligence, Universe Designer, Crystal Reports and Central Management Console.
- Experience in data ingress and egress using Sqoop from HDFS to Relational Database Systems and vice - versa. Good knowledge of Log4j for error handling.
- Expert knowledge in real time data analytics using Apache Storm.
- Expertise in Java/J2EE technologies such as Core Java, spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS and JavaScript.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Worked in Agile methodology of software development process as a Scrum Master.
- Extensive work experience in ETL processes consisting of data sourcing, data transformation, mapping and loading of data from multiple source systems into Data Warehouse using Informatica Power Center.
- Strong foundation in Programming, Debugging skills, developed modules which have met with client requirements & targets.
- Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
- Expertise in Data warehousing concepts, Dimensional Modeling and Data Modeling systems.
- Extensive experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Have Experience of using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
- Good Experience in writing complex SQL queries with databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
TECHNICAL SKILLS:
Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans.
Languages: Java / J2EE, Scala, Python HTML, SQL, Spring, Hibernate, JDBC, JSON, PySpark and JavaScript.
Big Data Technologies: Hadoop, Map Reduce, Pig, Hive, Hbase, Sqoop, Oozie, Zookeeper, Avro, Kafka, Spark, Parquet, Flume, Storm, Impala, Scala, Mahout, Hue, Flink, Tez, HCatalog.
Database: RDBMS, DB2, Teradata and SQL Server, Data Visualization, Hbase, Cassandra.
Business Tools: Tableau, Business Objects XI R2, Informatica Powercenter 8.x, OLAP/OLTP, Dimension Modeling, Data Modeling.
PROFESSIONAL EXPERIENCE:
Confidential, San Antonio
Hadoop Developer
Responsibilities:
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Created Hive tables from JSON data using data serialization framework like AVRO.
- Used PySpark to perform Map Reduce to externally sort and hash data from a large file that was bigger than available RAM size.
- Extensively used Pig for data cleansing and HIVE queries for the analysts.
- Written Python applications to interact with the MySQL database using Spark SQL Context and also accessed Hive tables using Hive Context.
- Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
- Designed and published workbooks and dashboards using Tableau Dashboard/Server.
- Developed high integrity programs used in systems where predictable and highly reliable operation is essential using Spark.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
Environment: Hadoop (HDFS) multi-node installation, Map Reduce, Spark, Kafka, Hive, Impala, flume, Storm, Zookeeper, Oozie, Java, Scala, PySpark, UNIX Shell Scripting, TestNG, MySQL, Eclipse, Toad, Tableau and HP Vertica.
Confidential, Los Angeles
Hadoop Developer
Responsibilities:
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Developed workflows in Oozie for business requirements to extract the data using Sqoop.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Created WebI reports with multiple data providers and synchronized the data using Merge Dimensions.
- Developed WebI Reports (on demand, AdHoc Reports, Frequency Reports, Summary Reports, Sub Reports, Drill - Down, Cross - Tab).
- Involved in creating and scheduling Oozie workflow scripts to run series of Sqoop imports, MapReduce Transformation jobs, Hive scripts.
- Worked on various Business Object Reporting functionalities such as Slice and Dice, Master/detail, User Response function and different Formulas.
- Strong debugging and problem solving skills with excellent understanding of system development methodologies, techniques and tools.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Responsible for developing data pipeline using HDInsight, flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Flume, Oozie, Spark, Hue, LINUX, Teradata, Java APIs, Java collection, SQL Business Objects XI R2.
Confidential, San Francisco
Hadoop Developer
Responsibilities:
- Involved in the design and development of Data Warehouse.
- Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
- Involved in using Pig Latin to analyze the large scale-data.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Extensively used Informatica Power Center 9.5/8.6.1 as ETL tool for developing the project.
- Used Shell Scripts for loading, unloading, validating and records auditing purposes.
- Acquired good understanding and experience of NoSQL databases such as HBase and Cassandra.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Environment: Informatica 8.X/9.X, Oracle 10g, Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.
Confidential
Java Developer
Responsibilities:
- Involved in Requirements analysis, design, and development and testing.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Implemented dynamic functionality to screens using JQuery and Implemented Asynchronous means of retrieval of data using AJAX.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
- Used Enterprise Java Beans (EJBs) extensively in the application Developed and deployed Session Beans to perform user authentication.
- Worked on JavaScript to validate input, manipulated HTML elements using JavaScript, developed external JavaScript codes that can be used in several different web pages.
- Designed and developed the Critical modules like Order Processing and Order Making and Agents and Reports Generation.
- Designed different design specifications for application development that includes front-end, back-end using design patterns, UML.
- Experience in understanding of relational database concepts and development with multiple RDBMS databases including Oracle, MySQL, and MS SQL Server SQL Dialects such as PL/SQL.
- Designed tables to access the database in and Involved in writing SQL database queries, and PL/SQL that includes stored procedures, triggers, cursors, dblinks, object types and functions etc.
Environment: Hadoop, MapReduce, Pig, Hive, Servlet, Enterprise Java beans, Custom Tags, Stored Procedures, JavaScript, Java, Spring Framework, Struts, Web Services, Oracle.