Big Data Engineer Resume
Columbia, MD
OBJECTIVE
- Looking for a challenging position as HADOOP DEVELOPER/ENGINEER where I can use my knowledge, technical and analytical skills to contribute to projects that add value to the organization.
SUMMARY
- Having total work experience of 12+ years in Information Technology with skills in analysis, design, development, testing and deploying various software applications, which include Web related, windows applications with emphasis on Object Oriented Programming and Mainframe applications.
- About 3+ years of work experience on Big Data Analytics as Hadoop Developer/Engineer.
- Experienced on major Hadoop ecosystem’s projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager, AWS & Hortonworks.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics
- Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster
- Have hands on experience in writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop Map Reduce, HDFS, Pig, Hive, Sqoop, Python, Scala and Spark.
- Handling and further processing schema oriented and non - schema oriented data using Pig.
- Read, processed and stored desperate data in parallel using Pig.
- Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL and Netezza.
- Experienced in coding SQL, Procedures/Functions, Triggers and Packages on database (RDBMS) packages like Oracle.
- Developed stored procedures and queries using SQL.
- Worked on Agile methodology, SOA for many of the applications.
- Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with individuals Confidential all levels and can work as a part of a team as well as independently.
- In-depth understanding of Data Structure and Algorithms.
- Strong Communication skills of written, oral, interpersonal and presentation.
- Ability to perform Confidential a high level, meet deadlines, adaptable to ever changing priorities.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper
- Mentor team in UNIX and open source tools/platformsrevolving around Hadoop.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Zookeeper, Python, Scala and Spark
No SQL Databases: HBase
Programming Languages: Java, PL/SQL, Pig Latin, Hive QL, Unix shell scripts, Cobol, JCL,VSAM
Operating Systems: UNIX, Windows, LINUX, Z/OS
Web technologies: JSP, JDBC
Databases: Oracle 9i/10g, Netezza, Microsoft SQL Server and MySQL
Java IDE: Eclipse 3.x
Tools: TOAD, SQL Developer
PROFESSIONAL EXPERIENCE
Confidential, Columbia, MD
Big Data Engineer
Responsibilities:
- Implemented the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created Hive tables to store the processed results in a tabular format.
- Writing the script files for processing data and loading to HDFS
- Writing CLI commands using HDFS.
- Developed the UNIX shell/Python scripts for creating the reports from Hive data.
- Completely involved in the requirement analysis phase.
- Responsible for building scalable distributed data solutions using Hadoop
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Exported the result set from Hive to Netezza using Shell scripts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
- Worked on Hadoop testing which involves unit testing of Map-Reduce code, Hive and Pig UDF’s.
- Developed manual test validation test cases which involves data sampling, data completeness and data quality.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Environment: Amazon(AWS) EC/2, Hadoop 2.4/2.6, Hive, Map Reduce, Sqoop, Pig, JDK1.6/1.7,HDFS, Flume, Tidal, HBase, Zookeeper, Mahout, Spark, Scala, Unix Shell/python script, Restful Web services, PL/SQL and SQL.
Confidential, Columbia, MD
Hadoop Developer
Responsibilities:
- Developed Map-Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Implemented the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created Hive tables to store the processed results in a tabular format.
- Writing the script files for processing data and loading to HDFS
- Writing CLI commands using HDFS.
- Developed the UNIX shell/Python scripts for creating the reports from Hive data.
- Completely involved in the requirement analysis phase.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Involved in gathering the requirements, designing, development and testing
- Responsible for building scalable distributed data solutions using Hadoop
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Exported the result set from Hive to Netezza using Shell scripts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
- Worked on Hadoop testing which involves unit testing of Map-Reduce code, Hive and Pig UDF’s.
- Developed manual test validation test cases which involves data sampling, data completeness and data quality.
Environment: Hadoop 2.4/2.6, Web Services (SOAP and REST), JMS, JavaScript, AngularJS, JSP, AWS, XML, XSD, Oracle PL/SQL, IBM WebSphere Portal, Hive, Map Reduce, Sqoop, Pig, JDK1.6/1.7,HDFS, Flume, Tidal, HBase, Zookeeper, Mahout, Spark, Scala, Unix Shell/python script.
Confidential, Fremont, CA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
- Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Applied transformations and filtered both traffic using Pig.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
- Performed unit testing using MRUnit.
- Responsible for building scalable distributed data solutions using Hadoop
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Provide support data analysts in running Pig and Hive queries.
- Importing and exporting Data from MySQL/Oracle to HiveQL using SQOOP.
- Importing and exporting Data from MySQL/Oracle to HDFS using SQOOP.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Exported the result set from Hive to MySQL using Shell scripts.
- Developed HIVE queries for the analysts.
Confidential, San Ramon, CA
Java/PLSQL Developer
Responsibilities:
- Responsible for gathering and analyzing requirements and converting them into technical specifications
- Used Rational Rose for creating sequence and class diagrams
- Developed presentation layer using JSP, Java, HTML and JavaScript
- Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing the session for database transactions respectively. Used HQL and SQL for fetching and storing data in databases
- Participated in the design and development of database schema and Entity-Relationship diagrams of the backend Oracle database tables for the application
- Implemented web services with Apache Axis
- Designed and Developed Stored Procedures, Triggers in Oracle to cater the needs for the entire application. Developed complex SQL queries for extracting data from the database
- Designed and built SOAP web service interfaces implemented in Java
- Used Apache Ant for the build process
- Used Clear Case for version control and Clear Quest for bug tracking
Confidential
PL/SQL Developer
Responsibilities:
- Experience on Oracle SQL, PL/SQL and Perl development
- Lead experience giving recommendations and direction on development strategies, conducting code reviews, and mentoring junior developers, and setting Standards and establishing Best Practices.
- Good RDBMS understanding
- Expertise in tuning Oracle SQLs
- Good Understanding of Data warehouse concepts
- Knowledge of SDLC cycles
- Strong experience in Technical documentation, coding and testing
- Expertise in problem solving through debugging, research and investigation
- Familiar with Best practices, Standard Concepts.
- Good Communication skills
- Requirements analysis of the inputs.
- Execution of the required deliverables.
- Defect prevention activities.
- Creation of UTP, UTR.
