Sr. Big Data Engineer Resume
Livonia, MI
SUMMARY
- 7+ years of experience in IT including design and development of object oriented, web based enterprise applications, and big data processing applications.
- Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Hbase, sqoop, Apache Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, Data Node and Map Reduce concepts.
- Expertise in installation, configuration, supporting and managing of Big Data and underlying infrastructure of HadoopCluster.
- Experienced on major components in HadoopEcosystem like HadoopMap Reduce, HDFS, HIVE, PIG, Sqoop.
- Experienced on fast streaming big data components like Flume, Kafka, Storm and Spark.
- Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and Hbase.
- Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL server.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases using JUnit, MRUnit.
- Experienced with build tools like Maven, Ant and CI tools like Jenkins.
- Excellent experience with version controls like CVS, SVN and Git.
- Experienced in Scrum, Agile and Waterfall models.
- Extensive knowledge in NoSQL databases like HBase, Cassandra.
- Experienced with performing CRUD operations using HBase Java Client API and Rest API.
- Experienced with Oozie Workflow Engine to automate and parallelizeHadoopMap/Reduce, Hive and PIG jobs.
- Experienced with processing different file formats like Avro, XML, JSON and Sequence file formats using MapReduce programs.
- Excellent Java development skills using J2EE Frameworks like spring, Hibernate, EJBs and Web Services.
- Experienced with implementing SOAP and Rest based Web Services.
- Excellent experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Java.
- Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Experienced with working MapReduce Design Patterns to solve complex MapReduce programs.
- Experienced on extending Hive and Pig core functionality by writing custom UDFs.
- Experienced inHadoopadministration activities such as installation and configuration of clusters using Apache and Cloudera.
- Excellent Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.
TECHNICAL SKILLS
Languages: Java, C, C++, SQL and XML, PL/SQL, HTML, Javascript
Hadoop/BigData Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeperand Cloudera Manager, MongoDB, NO SQL Database HBase
Version Control Tools: Github, Bitbucket, CVS, SVN, Clear Case, Visual Source Safe
Database: Oracle 8i/9i/10g/11g/12c MS SQL Server 2005, MySQL,Teradata
Build & Deployment Tools: Maven, ANT, Hudson, Jenkins
Monitoring and Reporting: Tableau, Custom shell scripts,HadoopDistribution Horton Works, Cloudera, MapR
PROFESSIONAL EXPERIENCE
Confidential, Livonia MI
Sr. Big Data Engineer
Responsibilities:
- Worked on Hadoop technologies like Pig Latin, Hive, Sqoop and Big Data testing.
- Worked on tools Flume, Kafka, Storm and Spark.
- Developed automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
- Developed Hive scripts for end user / analyst requirements for adhoc analysis.
- Used of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Extensively used Apache Sqoop for efficiently transferring bulk data between ApacheHadoopand relational databases (Oracle) for product level forecast.
- Worked in tuning Hive and Pig scripts to improve performance.
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed TWS workflow for scheduling and orchestrating the ETL process.
- Used Impala to read, write and query theHadoopdata in HDFS or HBase or Cassandra.
- Functional, non-functional and performance testing of key systems prior to cutover to AWS
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Extracted feeds form social media sites such as Twitter.
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
- Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster
- Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked on importing and exporting data from different databases like Oracle, Teradata into HDFS andHive using Sqoop.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Java 7, Eclipse, Oracle 12c, Hadoop, MapReduce, Kafka, Hive, HBase, TWS, ITG Linux, MapReduce, HDFS, Hive, AWS, MapR, SQL, Talend 5.5.2.
Confidential, NYC NY
Sr. Big Data Engineer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed simple to complex Map Reduce job using Hive.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Optimized Map/Reduce jobs to use HDFS efficiently by using various compression mechanisms.
- Created partitioned tables in Hive.
- Extensively used Pig for data cleansing.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF's to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on streaming the data into HDFS from web servers using Flume.
- Designed and implemented Hive and Pig UDF's for evaluation, filtering, loading and storing of data.
- The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency.
- Wrote Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Implemented Lateral View in conjunction with UDTFs in Hive.
- Performed complex Joins on the tables in Hive.
- Load and transform large sets of structured, semi structured using Hive and Impala.
- Connected Hive and Impala to Tableau reporting tool and generated graphical reports.
- Worked on implementation and maintenance of ClouderaHadoopcluster.
Environment: Hadoop, HDFS, Pig 0.10, Hive, AWS, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.
Confidential, SFO, CA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Designed and developed Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive and Pig.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Experienced in defining job flows.
- Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
- UsedHadoopFS scripts for HDFS (HadoopFile System) data loading and manipulation.
- Performed Hive test queries on local sample files and HDFS files.
- Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
- Extensively used Pig for data cleaning and optimization.
- Developed Hive queries to analyze data and generate results.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
- Load and transform large sets of structured, semi structured and unstructured data.
- Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
- Configured SQL database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Responsible to manage data coming from different sources.
- Responsible for implementing MongoDB to store and analyze unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented CDH3 Hadoop cluster on CentOS.
Environment: Hadoop, MapReduce, HDFS, Hive, Spark, Pig, Java, SQL, Cloudera Manager, Sqoop, Strom, Solr, Flume, Cassandra, Oozie, Java (jdk 1.6), Eclipse
Confidential
Java/J2EE developer
Responsibilities:
- Developed Servlets and Java Server Pages (JSP).
- Writing Pseudo-code for Stored Procedures.
- Developed PL SQL queries to generate reports based on client requirements.
- Enhancement of the System according to the customer requirements.
- Designed and Developed UI pages in CBMS application using CBMS custom framework, business objects, JDBC, JSP and java script.
- Involved in business requirement gatherings, development of technical design documents and design of real time eligibility project.
- Developed Real Time Eligibility web service using CBMS custom framework, AJAX 2.0, WSDL and SOAP UI.
- Used JAXB Marshaller and Unmarshaller to marshall and unmarshall WSDL request.
- Developed all WSDL components, XSD, producing and consuming WSDL web services using AJAX 1.5 and AJAX 2.0.
- Development of java services using java code, SQL queries, JDBC, Spring and hibernate entities.
- Used to Eclipse for development, debugging and deployment of the code.Created test case scenarios for Functional Testing.
- Used Java Script validation in JSP pages.
- Helped design the database tables for optimal storage of data.
- Coded JDBC calls in the servlets to access the Oracle database tables.
- Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
- Prepared final guideline document that would serve as a tutorial for the users of this application.
Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle 10g, PL SQL, HTML, JSP, Eclipse, UNIX.