We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

New York City, NY

SUMMARY

  • Overall 7+ years of professional IT experience as a software developer with a background in analysis, development, integration and testing of applications.
  • 4.5 years of experience as aHadoop Developer and Big Data analyst.
  • Solid expertice in the workings of Hadoop internals, architecture and supporting ecosystem components like Hive, Sqoop, Pig and Oozie.
  • Apart from developing on the Hadoop ecosystem, also have good experience in installing and configuring of the Cloudera’s distribution (CDH 3,4 and 5), Hortonworks distribution ( HDP2.1 and 2.2) and IBM BigInsights(2.1.2 and 3.0.1).
  • Good experience with setting up and configuring a Hadoop cluster on Amazon web Services (EC2) on clusters of nodes running on CentOS 5.4, 6.3 and RHEL.
  • Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie.
  • Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
  • Adept at HiveQL and have good experience of partitioning (time based), dynamic partitioning and bucketing to optimize Hive queries.Also used Hive’s MapJoin to speed up the queries when possible.
  • Used Hive to create tables in both delimited text storage format and binary storage format.
  • Have excellent working experience in using the two popular Hadoop binary storage formats Avro datafiles and Sequence files.
  • Good Knowledge of analyzing data in HBase using Hive and Pig.
  • Working Knowledge in NoSQL Databases like HBase.
  • Also have experience developing Hive UDAF to apply custom aggregation logic.
  • Created Pig Latin scripts made up of series of operations and transformations that were applied to the input data to produce the required output.
  • Good experience with the range of Pig functions like Eval, Filter, Load and Store functions.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa. Also have good experience in using the Sqoop direct mode with external tables to perform very fast data loads.
  • Good experience on Linux shell scripting.
  • Good Knowledge on ETL tools like Data Stage, Informatica.
  • Used OOZIE engine for creatingworkflow and coordinator jobs that schedule and execute various Hadoop jobs such as MapReduce jobs, Hive, Pig and Sqoop operations.
  • Experienced in developingJAVAMap Reduce Programs using Apache Hadoop for analyzing data as per the requirement.
  • Solid experience writing complex SQL queries. Also experienced in working with NOSQL databases like Hbase.
  • Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g..
  • Have extensive experience in building and deploying applications on Web/Application Servers like Weblogic, Websphere, and Tomcat.
  • Experience in Building, Deploying and Integrating with Ant, Maven.
  • Experience in processing the hive table data using Spark.
  • Experience in development of logging standards and mechanism based on Log4J
  • Strong work ethic with desire to succeed and make significant contributions to the organization
  • Complementing my technical skills are my solid communication skills.

TECHNICAL SKILLS

Hadoop Ecosystem Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase and Oozie

Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), SQL and PL/SQL

Frameworks: Hibernate 2.x/3.x, Struts 1.x/2.x

Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey

Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript, Web Services(JAX-WS)

RDBMS Databases: Oracle 8i/9i/10g,MySQL 4.x/5.x,DB2

NoSQL Database: HBase

Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Visio andRational Rose

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

Big Data Engineer

Responsibilities:

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Written complex Hive and SQL queries for data analysis to meet business requirements.
  • Expert in importing and exporting data into HDFS and Hive using Sqoop.
  • Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
  • Expert in writing HiveQL queries and Pig Latin scripts.
  • Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
  • Experience in providing support to data analyst in running Pig and Hive queries
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
  • Experience in developing customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF's
  • Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
  • Used Flume in Loading log data into HDFS.
  • Created HIVE managed and external tables.
  • Moving the data from Oracle, Teradata and MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in hive.Handled incremental data loads from RDBMS into HDFS using Sqoop.
  • Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Analyzed data using HiveQL to generate payer by reports for transmission to payers form payment summaries.
  • Extensively worked on PIG scripts data cleansing and optimization.
  • Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
  • Importing and exporting data into HDFS, Hive and Hbase using Sqoop from Relational Database.
  • Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • DevelopedHiveandImpalascripts on Avro and parquet file formats.
  • Experience in NoSql database such as Hbase.

Environment: MapR Hadoop, HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm, Spark, Scala, XML, MS Access 2003, Java.

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Involved in review of functional and non-functional requirements.
  • Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Worked in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Designed and Implemented Real time applications using Apache Storm, Kafka, and Accumulo.
  • Tested Map Reduce programs using MR unit.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written HiveUDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
  • Developed a custom file system plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows HadoopMapReduceprograms, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system.
  • Worked in migrating data from SQL Server to Mongo DB.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Java 6, Eclipse, Oracle 10g, Sub Version, Hadoop, Hive, HBase, Linux, MapReduce, HDFS, MongoDB, Java (JDK 1.6), Hadoop Distribution of Horton Works, Cloudera, MapReduce, DataStax, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.

Confidential -Farmington Hills -MI

Hadoop developer

Responsibilities:

  • Developed Map Reduce jobs in java for data cleansing and pre-processing.
  • Moving data from DB2, OracleExadata to HDFS and vice-versa using SQOOP.
  • Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis
  • Worked with different file formats and compression techniques to determine standards
  • Developed hive queries and UDFS to analyze/transform the data in HDFS.
  • Developed hive scripts for implementing control tables logic in HDFS.
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database
  • Developed Pig scripts and UDF's as per the Business logic.
  • Developed user defined functions in Pig using Python.
  • Analyzing/Transforming data with Hive and Pig.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
  • Designed and developed read lock capability in HDFS.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Implemented Hadoop Float equivalent to the DB2 Decimal.
  • Participation in Performance tuning in database side, transformations, and jobs level.
  • Effective coordination with offshore team and managed project deliverable on time.
  • Worked on QA support activities, test data creation and Unit testing activities.

Environment: Hadoop, MapReduce, HDFS, Hbase, Hive, Pig, Oozie, Flume, Java (jdk 1.6), Eclipse

Confidential, Chicago, IL

Java Developer

Responsibilities:

  • Involved in the requirements gathering. Design, Development, Unit testing and Bug fixing.
  • Used Agile Methodologies to manage full life-cycle development of the project.
  • Developed application using Struts, spring and Hibernate.
  • Developed rich user interface using JavaScript, JSTL, CSS, JQuery and JSP’s.
  • Developed custom tags for implementing logic in JSP’s.
  • Used Java script, JQuery, JSTL, CSS and Struts 2 tags for developing the JSP’S.
  • Involved in making release builds for deploying the application for test environments.
  • Used Oracle database as backend database.
  • Wrote SQL to update and create database tables.
  • Used Eclipse as IDE.
  • Using RIDC Interface get content details and Create Content through application.
  • Used Spring IOC for injecting the beans.
  • Used Hibernate for connecting to the database and mapping the entities by using hibernate annotations.
  • Created JUnit test cases for unit testing application.
  • Used JUNIT and JMOCK for unit testing.

Environment: Java, J2EE, Struts MVC, JDBC, JSP, JavaScript, HTML, Ant, Web sphere Application Server, Oracle, JUNIT and Log4j, Eclipse

Confidential

Java Developer

Responsibilities:

  • Implemented Object Oriented Analysis and Design concepts using UML include development of class diagrams, Sequence diagrams, and State diagrams and implemented these diagrams in Star UML.
  • Developed the User Interface using Java Server Pages utilizing Custom Tag Libraries, and Java Script.
  • Building and deployment of EAR, WAR, JAR files on development, test and production systems in JBoss Application Server
  • Involved in designing Servlets, JSP pages, deploying and testing them in eclipse.
  • Responsible for creation and execution of Unit and Integration Tests.
  • Used SAX API for accessing XML documents and for notifying the application of a stream of parsing events.
  • Designed & implemented UnMarshallers / Marshallers with the help of Apache Axis to store the entire XML data in to Java Objects and vice versa.
  • Data retrieval and storage in Oracle database. Retrieval of data from database using JDBC Connectivity.
  • Used XSL/ XSLT for Transforming and displaying reports. Developed DTD’s for XML.

Environment: Java/J2EE, JSP, Servlets, Struts, Java Script, AJAX, HTML, CSS, JDeveloper IDE, Oracle 9i, Ant, Apache Tomcat Web Server.

Confidential

Java Developer

Responsibilities:

  • Implemented various J2EE design patterns for designing this application.
  • Design patterns of Business Delegates, Service Locator and DTO are used for designing the web module of the application.
  • Used Factory, Singleton design patterns for implementing enterprise modules/DTO's.
  • Developed the Web Interface using Struts,JavaScript, HTML and CSS.
  • Extensively used the Struts controller component classes for developing the applications.
  • Extensively used the struts application resources properties file for error codes, views labels and for Product Internationalization.
  • Used RAD (Rational ApplicationDeveloper7.0) as a Development platform
  • Struts 1.2 has provided its own Controller component and integrates with other technologies to provide the Model and the View for the Model, used Struts to interact with standard data access technologies, like JDBC and EJB.
  • JavaBeans were used to store in a number of different collections of "attributes". The JavaServer Pages (JSP) Specification defines scope choices.
  • Used SVN for source code control and JUNIT for unit testing

Environment: Java, J2EE, Struts MVC, JDBC, JSP, JavaScript, HTML, Web sphere Application Server, Oracle, JUNIT and Log4j

We'd love your feedback!