We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

0/5 (Submit Your Rating)

Livonia, MI

SUMMARY

  • 7+ years of experience in IT including design and development of object oriented, web based enterprise applications, and big data processing applications.
  • Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Hbase, sqoop, Apache Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, Data Node and Map Reduce concepts.
  • Expertise in installation, configuration, supporting and managing of Big Data and underlying infrastructure of HadoopCluster.
  • Experienced on major components in HadoopEcosystem like HadoopMap Reduce, HDFS, HIVE, PIG, Sqoop.
  • Experienced on fast streaming big data components like Flume, Kafka, Storm and Spark.
  • Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and Hbase.
  • Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
  • Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL server.
  • Experienced in preparing and executing Unit Test Plan and Unit Test Cases using JUnit, MRUnit.
  • Experienced with build tools like Maven, Ant and CI tools like Jenkins.
  • Excellent experience with version controls like CVS, SVN and Git.
  • Experienced in Scrum, Agile and Waterfall models.
  • Extensive knowledge in NoSQL databases like HBase, Cassandra.
  • Experienced with performing CRUD operations using HBase Java Client API and Rest API.
  • Experienced with Oozie Workflow Engine to automate and parallelizeHadoopMap/Reduce, Hive and PIG jobs.
  • Experienced with processing different file formats like Avro, XML, JSON and Sequence file formats using MapReduce programs.
  • Excellent Java development skills using J2EE Frameworks like spring, Hibernate, EJBs and Web Services.
  • Experienced with implementing SOAP and Rest based Web Services.
  • Excellent experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Java.
  • Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Experienced with working MapReduce Design Patterns to solve complex MapReduce programs.
  • Experienced on extending Hive and Pig core functionality by writing custom UDFs.
  • Experienced inHadoopadministration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Excellent Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.

TECHNICAL SKILLS

Languages: Java, C, C++, SQL and XML, PL/SQL, HTML, Javascript

Hadoop/BigData Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeperand Cloudera Manager, MongoDB, NO SQL Database HBase

Version Control Tools: Github, Bitbucket, CVS, SVN, Clear Case, Visual Source Safe

Database: Oracle 8i/9i/10g/11g/12c MS SQL Server 2005, MySQL,Teradata

Build & Deployment Tools: Maven, ANT, Hudson, Jenkins

Monitoring and Reporting: Tableau, Custom shell scripts,HadoopDistribution Horton Works, Cloudera, MapR

PROFESSIONAL EXPERIENCE

Confidential, Livonia MI

Sr. Big Data Engineer

Responsibilities:

  • Worked on Hadoop technologies like Pig Latin, Hive, Sqoop and Big Data testing.
  • Worked on tools Flume, Kafka, Storm and Spark.
  • Developed automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
  • Developed Hive scripts for end user / analyst requirements for adhoc analysis.
  • Used of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between ApacheHadoopand relational databases (Oracle) for product level forecast.
  • Worked in tuning Hive and Pig scripts to improve performance.
  • Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Created Sqoop job with incremental load to populate Hive External tables.
  • Developed TWS workflow for scheduling and orchestrating the ETL process.
  • Used Impala to read, write and query theHadoopdata in HDFS or HBase or Cassandra.
  • Functional, non-functional and performance testing of key systems prior to cutover to AWS
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Extracted feeds form social media sites such as Twitter.
  • Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
  • Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster
  • Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Worked on importing and exporting data from different databases like Oracle, Teradata into HDFS andHive using Sqoop.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Java 7, Eclipse, Oracle 12c, Hadoop, MapReduce, Kafka, Hive, HBase, TWS, ITG Linux, MapReduce, HDFS, Hive, AWS, MapR, SQL, Talend 5.5.2.

Confidential, NYC NY

Sr. Big Data Engineer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed simple to complex Map Reduce job using Hive.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Optimized Map/Reduce jobs to use HDFS efficiently by using various compression mechanisms.
  • Created partitioned tables in Hive.
  • Extensively used Pig for data cleansing.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF's to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on streaming the data into HDFS from web servers using Flume.
  • Designed and implemented Hive and Pig UDF's for evaluation, filtering, loading and storing of data.
  • The Hive tables created as per requirement were Internal or External tables defined with appropriate Static and Dynamic partitions, intended for efficiency.
  • Wrote Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Implemented Lateral View in conjunction with UDTFs in Hive.
  • Performed complex Joins on the tables in Hive.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Connected Hive and Impala to Tableau reporting tool and generated graphical reports.
  • Worked on implementation and maintenance of ClouderaHadoopcluster.

Environment: Hadoop, HDFS, Pig 0.10, Hive, AWS, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.

Confidential, SFO, CA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Designed and developed Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive and Pig.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
  • Experienced in defining job flows.
  • Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
  • UsedHadoopFS scripts for HDFS (HadoopFile System) data loading and manipulation.
  • Performed Hive test queries on local sample files and HDFS files.
  • Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
  • Extensively used Pig for data cleaning and optimization.
  • Developed Hive queries to analyze data and generate results.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
  • Configured SQL database to store Hive metadata.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
  • Responsible to manage data coming from different sources.
  • Responsible for implementing MongoDB to store and analyze unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Implemented CDH3 Hadoop cluster on CentOS.

Environment: Hadoop, MapReduce, HDFS, Hive, Spark, Pig, Java, SQL, Cloudera Manager, Sqoop, Strom, Solr, Flume, Cassandra, Oozie, Java (jdk 1.6), Eclipse

Confidential

Java/J2EE developer

Responsibilities:

  • Developed Servlets and Java Server Pages (JSP).
  • Writing Pseudo-code for Stored Procedures.
  • Developed PL SQL queries to generate reports based on client requirements.
  • Enhancement of the System according to the customer requirements.
  • Designed and Developed UI pages in CBMS application using CBMS custom framework, business objects, JDBC, JSP and java script.
  • Involved in business requirement gatherings, development of technical design documents and design of real time eligibility project.
  • Developed Real Time Eligibility web service using CBMS custom framework, AJAX 2.0, WSDL and SOAP UI.
  • Used JAXB Marshaller and Unmarshaller to marshall and unmarshall WSDL request.
  • Developed all WSDL components, XSD, producing and consuming WSDL web services using AJAX 1.5 and AJAX 2.0.
  • Development of java services using java code, SQL queries, JDBC, Spring and hibernate entities.
  • Used to Eclipse for development, debugging and deployment of the code.Created test case scenarios for Functional Testing.
  • Used Java Script validation in JSP pages.
  • Helped design the database tables for optimal storage of data.
  • Coded JDBC calls in the servlets to access the Oracle database tables.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
  • Prepared final guideline document that would serve as a tutorial for the users of this application.

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle 10g, PL SQL, HTML, JSP, Eclipse, UNIX.

We'd love your feedback!