Hadoop Engineer/Big Data Architect Resume Boston, MA - Hire IT People

SUMMARY

Over 11 years of IT experience in Hadoop, Java, Data mining, Data warehousing and Big Data analysis.
Over 11 years of experience in design and development of application and products using java, Shell scripting, Hadoop and sql.
5 years of experience with Hadoop (CDH 4.7 - 5.10, HDP (Horton works 2.6), MAPR (5.2) and IBM-BIGI).
Experience with tools in Hadoop Ecosystem (HIVE, IMPALA, PIG LATIN, Confidential, Spark, Sqoop and HBase).
Experience of 2 years writing Spark Transformation and ACTION using Scala for Spark RDD and DataFrame.
Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Developed SPARK applications using Scala for easy Hadoop transitions.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Created DataFrame in SPARK using RDD to run Spark SQL queries on that.
Created JDBC connection from SPARK to Oracle and other RDBMS for querying and transformation.
Currently working on design and development of Confidential -Hadoop connector called Confidential using java.
Working on automation and manual testing for Open Source Project called Confidential .
Good knowledge of Kerberos and other security futures being used with Hadoop.
Having good work experience in file formats such as AVRO, JSON, ORC, RC and Parquet.
Worked in Confidential as Hadoop & Big Data Technical Architect for a period of 1.5 years.
Created MAPR Docker container to automate testing of Confidential .
Experience of 10 years in Core Java and 2 year in Scala programming for backend database work.
Used Multithreading, collections and various other OOPS concepts of java.
Full understanding of Software Development Life Cycle.
Experience of 3.5 years on analytical/MPP database HP Confidential for data mining projects.
Good Experience in PL/SQL programming which includes complex SQL queries inside stored procedures, functions and Triggers using database such as Oracle, db2 and SQL Server.
Worked on Confidential Data Warehouse and created TPT scripts ( Confidential Parallel Transfer scripts) to transfer data from Confidential to Hadoop and Confidential for analytics purpose.
Good knowledge of RDBMS, OOD and Client/Server architecture.
Skillful at programming new applications and maintenance.
Handled a team of 4 to 5 members and responsible for delivery of several projects.
Excellent analytical and communication skills.

TECHNICAL SKILLS

Languages: Java, c++,J2EE, jdbc, XML, PL/SQL, Shell scripting, V-SQL, Sas Scripting

Analytics: Hadoop, HP Confidential, SAS, TPT, PIG, HIVE, HBASE, Confidential, Confidential .

Database: Oracle, Db2, Confidential, Confidential, HBASE.

Reporting Tools: Crystal Reports 8.0/8.5/11.5

Operating System: Windows, UNIX

PROFESSIONAL EXPERIENCE

Hadoop Engineer/Big Data Architect

Confidential - Boston, MA

Responsibilities:

Working on Architecture, Design, development and testing of several functionalities for the latest and upcoming release of Confidential Hadoop Connector).
Worked on java collections to implement design of split.by.value support in Confidential .
Used Graph, Tree, List and other data structures to help optimize the flow of code in java.
Worked on Creation of framework for benchmarking test comparison between SPARK and Confidential .
Used java Multithreading for parallel data transfer of data using fast export and fast load functionality of Confidential .
Developed SPARK Transformations and Actions using Scala for comparing Confidential optimizer with SPARK optimizer.
Used reduceByKey, GroupByKey, joins, union, map, flat Map transformations using Scala.
Extensively used Spark-SQL for comparison between Confidential and spark.
Designed and developed support for several major data types (Varchar, char, json, Avro, xml, date, binary) in Confidential .
Developed major functionality to support Parquet file format in Confidential using java.
This development involved lots of research and implementation effort to make the parquet plugin work with Confidential .
Worked on concept of java serialization and deserialization for network transfer of data.
Created Docker container to automate testing of Confidential, Open PR is on GitHub.
Worked on major and minor bug fixes for release 1.4.2, 1.4.3 and 1.5 of Confidential .
Gone through end to end release process of Confidential .
Implemented AVRO and PARQUET support for Confidential using java SerDe.
Used RESTFULL services and web services for pulling unstructured and structured data for automation testing and analysis with Confidential and Confidential .
Used JDBC connection to connect Confidential and Confidential to Confidential .
Written code for making runtime prepared statements using java.
Given Real-time Product Engineering Support by responding customer calls and support request for Confidential .
Built kerberized environment for testing Confidential on all 4 Hadoop distributions (Cloudera, Horton, MapR, and IBM-BigI).
Worked on manual and automation testing of Kerberos Integration with Confidential .
Helped team mates to create automation scripts for testing Confidential in a kerberized Hadoop environment.
Created kerberized environment manually for testing Confidential in sles Hadoop (SUSE Linux Hadoop, CentOS) environment.
Working on automation and manual testing for the latest and upcoming release of Confidential (Open Source Query Engine on Hadoop) using JAVA and SHELL scripting.
Written automated script to create Docker cluster for testing Confidential .
Worked on creating and testing of Windows ODBC driver for Confidential .
Worked on testing Confidential integration with Ambari.
Spend several hours in research and development to make the new functionality working for Confidential and Confidential in Hadoop environment.

Environment: and Tools: Hadoop, HDFS, Map Reduce, JAVA, HIVE, UNIX Shell Scripting, SQL, Docker, Cloudera Hadoop CDH4.7 to CDH5.8, HDP (2.3, 2.4.2) MAPR 5.0, IBM BIGINSIGHT (4.0, 4.1), JDBC, Eclipse,Ambari,Cloudera manager.

Big Data architect

Confidential - Northbrook, IL

Responsibilities:

Identification of File formats which will be best suited in Hadoop environment.
Did POC to compare file format such as Avro, Json, Parquet, ORC and flat file.
Used Impala, hive and pig to use different file formats.
Used, Modified and compared several json serde such as hcatalog serde, Cloudera serde etc. to identify the best one which suits our need.
Analysed Lots of Avro, Json and flat file data using Apache Spark in Hadoop for analysis.
Brought data from RDBMS databases in Hadoop using Apache Spark and its JDBC Capability.
Created lots of Transformations and Actions on imported RDBMS data using Scala apps.
Did data mining on the incoming RDBMS data using RDD Transformation’s and Action’s.
Used Transformation such as Cogroup, Join, GrouBykey, Union, intersection for data mining.
Connected Hive and Impala with sql developer and squirrel using Kerberos authentication so that it can be used easily by end users.
Created java code using jaas for connecting to impala and hive using Simba jdbc driver and through Kerberos authentication.
Did partitioning POC to identify the columns which needs to be partitioned and to compare the performance of partitioned data with un-partitioned data using all the file formats.
Working on Architectural design for creating a Data Hub which will store all the available policy, claims and quotes data within Confidential .
Fetched data from oracle using sqoop for testing Hadoop tools on real data.
Developed Automation testing framework using java and shell script to test out the cluster upon every Cloudera upgrade. Used HBase, Sqoop, Hive, Pig, Impala, Sqoop and Spark using Scala.
Written Json schema parser in Java to create hive ddl directly from Json schema.
Written Map Reduce Program to validate Json data with Json schema.
Written Map Reduce Program to convert xml to Json using Eclipse Link.
Working on architectural design to migrate existing batch operations for various applications in Hadoop.
Written several java programs and shell scripts using named pipes to ingest 100 million (10 TB) of synthetic policy data in hdfs for our Poc.
Modified the existing open source Java Json Serde to accommodate our needs.
Tested several file formats with various sizes of data sets in hive, pig and Impala to see their performance.
Did POC to compare query performance on various types of Hadoop tools such as Hive, Impala and Pig.
Did POC on integrating EMC ISILON with Confidential Hadoop Clusters.
Did POC to integrate Tableau and BO with Cloudera Hadoop.
Did POC to identify best ETL tools for Hadoop such as Informatica and Ab-Initio.
Doing research on identifying Spark and Phoenix as a possible solution for our projects so that it can be used in projects for fast querying.

Environment: Hadoop, Cloudera Hadoop CDH4.6 to CDH5.3, HDFS, Map Reduce, JDBC, Hive, PIG, Impala, Eclipse, JAVA, PL/SQL, DB2, UNIX Shell Scripting, SQL.

Confidential - Newark, DE

Responsibilities:

Used Eclipse to write Java and MapReduce programs to help on several analytics queries.
Responsible for release of project and managing team of 3 to implement the task efficiently and on time.
Worked on Hadoop MapReduce and hive to load historical data.
Written several hive queries to determine the frequently used channels for the customers so that it can identify which customers is more proactive on which channels.
Implemented Partitioning, Dynamic Partitions, Static Partitions and Buckets in HIVE.
Created several internal and external tables to load the data and analyze the data.
Exported the result set from HIVE to Db2 using Shell scripts.
Develop HIVE queries for the analysis.
Helped the team to increase Cluster from 25 Nodes to 56 Nodes.
Created documentation procedures for Confidential implementation
Analyzing SQL queries in Confidential utilizing the EXPLAIN function.
Understanding of Epoch, LGE, AHM in Confidential .
Integration of Hadoop HDFS with Confidential .
Written java programs to retrieve data from Confidential and feed it to our tablet application.
Written Confidential copy command using shell script to copy data from flat file to Confidential .
Created manual ETL process to load data in Hadoop and Confidential from oracle and Confidential .
Helped in feeding the results of analytics data from Confidential to the bctab app.
Worked as ETL Architect to make sure all the applications are migrated (along with server)
Created TPT scripts ( Confidential Parallel Transfer scripts) to transfer data from Confidential to Hadoop for analytics purpose.
Deep understanding and related experience with Hadoop/HDFS - internals and HDFS, Hive, Map/Reduce.
Deep understanding of schedulers, workload management, availability, scalability and Distributed data platforms.
Created several java classes needed to write algorithms using transactional data to help bank application to identify the top retailers and food chains which Confidential customers visit frequently.
This helps in providing appropriate rewards and deals to the customers based on their transaction history.

Environment: Hadoop, Confidential, HDFS, MapReduce, JDBC, Hive, V-SQL, Eclipse, JAVA, PL/SQL, ORACLE, DB2,UNIX Shell Scripting, SQL, TOAD, Transact-SQL, Confidential, Autosys scheduler, TPT.

Data Mining/Analyst

Confidential - Newark, DE

Responsibilities:

Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Written several Java program to retrieve data from Confidential and feed it to our db2 application.
Involved in transforming complex business logic used in COBOL to be translated in Confidential Sql.
Involved in writing scripts to transfer the raw data from db2 to Confidential using export command and copy command in db2 and Confidential respectively.
Good working knowledge of Confidential DB architecture, column orientation, High Availability and Recovery.
Good working knowledge of MPP, clustering, compression and continuous performance.
Database design, partition, segmentation, schemas, tables, Projections and cluster management
Import and export Confidential tables between databases.
Perform cluster, Confidential resource management with the Management Console.
Experience in working with a diverse teams in developing big data solutions
Resolve client technical issues submitted to the technical support team
Experience working closely with Data Scientists on Big Data Platform
Created manual ETL process to load data in Hadoop and Confidential from DB2 and Confidential .
Involved in creating proper unit test bed to perform the testing of migrated code from COBOL to V-SQL.
Expert knowledge developing and debugging in Java.
Involved in comparison of new data sets (through SAS scripting) getting generated from V-Sql with the data sets from original source so that all the differences are removed completely.
Involved in automating the end to end process using autosys scheduler. Automation was done in each and every step of ETL (Extract Transform and Load).
Worked on Hadoop MapReduce and hive to load historical data.

Environment: Confidential, DB2, SAS, UNIX Shell Scripting, V-SQL, JAVA, JDBC, PL/SQL, ORACLE, TOAD, Autosys scheduler, Hadoop.

Big Data Design and Application Developer

Confidential

Responsibilities:

Written several core java programs to manage these procedures using callable statements.
Also written several programs to retrieve and read data from the array object by using Collection framework in Java.
Worked on generating stub for web Services.
Used ANT for debugging and unit testing the code.
Helped in designing database so that it follows RDBMS standards.
Helped in writing several stored procedures and functions that were used to fetch data from backend.
Extract, Transport and Load data between different DW architectures
Implementation of Confidential database warehouse cluster.
Perform DB monitoring, backup and restore, and performance tuning.
Perform complex SQL queries using the V-SQL.
Created manual ETL process to load Historical data in Hadoop and analytical data in Confidential from DB2, ORACLE and Confidential .

Environment: DB2, SAS, Java, UNIX Shell Scripting, Confidential, V-SQL HADOOP, JDBC, Oracle SQL, PL/SQL, TOAD,Hibernate,Spring batch

SQL Developer

Confidential

Responsibilities:

Design of the application database.
Responsible for creating Multi-Threading Environment in the system and using collections and maps for scalability in system.
Development including Crystal Reports, writing Stored Procedures and the U.I for creating screen, coding and unit testing.
Written several stored procedures and functions to implement new functionality.
Unit Testing and System Testing.
Responsible for improving the system to become more scalable and allow multiple users to use the same areas at the same time.
Responsible for creating Multi-Threading Environment in the system.
Responsible for using collections and maps for scalability in system.
Responsible for optimizing the long running queries and making the system faster.
Unit Testing and System Testing.
Responsible for creating Multi-Threading Environment code in the system.
Responsible for using collections and maps for scalability in system.
Responsible for implementing the business functionality in Java and for creating various complex stored procedures and tables in MS SQL Server.
Responsible for creating webpages in JSP and backend in java using island framework.
Responsible for creating complex stored procedures.
Responsible for designing database.
Unit Testing and System Testing

Environment: Object Oriented Methodology, C++, Business Objects, SQL server, T-SQL.

We provide IT Staff Augmentation Services!

Hadoop Engineer/big Data Architect Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship