Hadoop Engineer/big Data Architect Resume
Boston, MA
SUMMARY
- Over 11 years of IT experience in Hadoop, Java, Data mining, Data warehousing and Big Data analysis.
- Over 11 years of experience in design and development of application and products using java, Shell scripting, Hadoop and sql.
- 5 years of experience with Hadoop (CDH 4.7 - 5.10, HDP (Horton works 2.6), MAPR (5.2) and IBM-BIGI).
- Experience with tools in Hadoop Ecosystem (HIVE, IMPALA, PIG LATIN, Confidential, Spark, Sqoop and HBase).
- Experience of 2 years writing Spark Transformation and ACTION using Scala for Spark RDD and DataFrame.
- Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Developed SPARK applications using Scala for easy Hadoop transitions.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Created DataFrame in SPARK using RDD to run Spark SQL queries on that.
- Created JDBC connection from SPARK to Oracle and other RDBMS for querying and transformation.
- Currently working on design and development of Confidential -Hadoop connector called Confidential using java.
- Working on automation and manual testing for Open Source Project called Confidential .
- Good knowledge of Kerberos and other security futures being used with Hadoop.
- Having good work experience in file formats such as AVRO, JSON, ORC, RC and Parquet.
- Worked in Confidential as Hadoop & Big Data Technical Architect for a period of 1.5 years.
- Created MAPR Docker container to automate testing of Confidential .
- Experience of 10 years in Core Java and 2 year in Scala programming for backend database work.
- Used Multithreading, collections and various other OOPS concepts of java.
- Full understanding of Software Development Life Cycle.
- Experience of 3.5 years on analytical/MPP database HP Confidential for data mining projects.
- Good Experience in PL/SQL programming which includes complex SQL queries inside stored procedures, functions and Triggers using database such as Oracle, db2 and SQL Server.
- Worked on Confidential Data Warehouse and created TPT scripts ( Confidential Parallel Transfer scripts) to transfer data from Confidential to Hadoop and Confidential for analytics purpose.
- Good knowledge of RDBMS, OOD and Client/Server architecture.
- Skillful at programming new applications and maintenance.
- Handled a team of 4 to 5 members and responsible for delivery of several projects.
- Excellent analytical and communication skills.
TECHNICAL SKILLS
Languages: Java, c++,J2EE, jdbc, XML, PL/SQL, Shell scripting, V-SQL, Sas Scripting
Analytics: Hadoop, HP Confidential, SAS, TPT, PIG, HIVE, HBASE, Confidential, Confidential .
Database: Oracle, Db2, Confidential, Confidential, HBASE.
Reporting Tools: Crystal Reports 8.0/8.5/11.5
Operating System: Windows, UNIX
PROFESSIONAL EXPERIENCE
Hadoop Engineer/Big Data Architect
Confidential - Boston, MA
Responsibilities:
- Working on Architecture, Design, development and testing of several functionalities for the latest and upcoming release of Confidential Hadoop Connector).
- Worked on java collections to implement design of split.by.value support in Confidential .
- Used Graph, Tree, List and other data structures to help optimize the flow of code in java.
- Worked on Creation of framework for benchmarking test comparison between SPARK and Confidential .
- Used java Multithreading for parallel data transfer of data using fast export and fast load functionality of Confidential .
- Developed SPARK Transformations and Actions using Scala for comparing Confidential optimizer with SPARK optimizer.
- Used reduceByKey, GroupByKey, joins, union, map, flat Map transformations using Scala.
- Extensively used Spark-SQL for comparison between Confidential and spark.
- Designed and developed support for several major data types (Varchar, char, json, Avro, xml, date, binary) in Confidential .
- Developed major functionality to support Parquet file format in Confidential using java.
- This development involved lots of research and implementation effort to make the parquet plugin work with Confidential .
- Worked on concept of java serialization and deserialization for network transfer of data.
- Created Docker container to automate testing of Confidential, Open PR is on GitHub.
- Worked on major and minor bug fixes for release 1.4.2, 1.4.3 and 1.5 of Confidential .
- Gone through end to end release process of Confidential .
- Implemented AVRO and PARQUET support for Confidential using java SerDe.
- Used RESTFULL services and web services for pulling unstructured and structured data for automation testing and analysis with Confidential and Confidential .
- Used JDBC connection to connect Confidential and Confidential to Confidential .
- Written code for making runtime prepared statements using java.
- Given Real-time Product Engineering Support by responding customer calls and support request for Confidential .
- Built kerberized environment for testing Confidential on all 4 Hadoop distributions (Cloudera, Horton, MapR, and IBM-BigI).
- Worked on manual and automation testing of Kerberos Integration with Confidential .
- Helped team mates to create automation scripts for testing Confidential in a kerberized Hadoop environment.
- Created kerberized environment manually for testing Confidential in sles Hadoop (SUSE Linux Hadoop, CentOS) environment.
- Working on automation and manual testing for the latest and upcoming release of Confidential (Open Source Query Engine on Hadoop) using JAVA and SHELL scripting.
- Written automated script to create Docker cluster for testing Confidential .
- Worked on creating and testing of Windows ODBC driver for Confidential .
- Worked on testing Confidential integration with Ambari.
- Spend several hours in research and development to make the new functionality working for Confidential and Confidential in Hadoop environment.
Environment: and Tools: Hadoop, HDFS, Map Reduce, JAVA, HIVE, UNIX Shell Scripting, SQL, Docker, Cloudera Hadoop CDH4.7 to CDH5.8, HDP (2.3, 2.4.2) MAPR 5.0, IBM BIGINSIGHT (4.0, 4.1), JDBC, Eclipse,Ambari,Cloudera manager.
Big Data architectConfidential - Northbrook, IL
Responsibilities:
- Identification of File formats which will be best suited in Hadoop environment.
- Did POC to compare file format such as Avro, Json, Parquet, ORC and flat file.
- Used Impala, hive and pig to use different file formats.
- Used, Modified and compared several json serde such as hcatalog serde, Cloudera serde etc. to identify the best one which suits our need.
- Analysed Lots of Avro, Json and flat file data using Apache Spark in Hadoop for analysis.
- Brought data from RDBMS databases in Hadoop using Apache Spark and its JDBC Capability.
- Created lots of Transformations and Actions on imported RDBMS data using Scala apps.
- Did data mining on the incoming RDBMS data using RDD Transformation’s and Action’s.
- Used Transformation such as Cogroup, Join, GrouBykey, Union, intersection for data mining.
- Connected Hive and Impala with sql developer and squirrel using Kerberos authentication so that it can be used easily by end users.
- Created java code using jaas for connecting to impala and hive using Simba jdbc driver and through Kerberos authentication.
- Did partitioning POC to identify the columns which needs to be partitioned and to compare the performance of partitioned data with un-partitioned data using all the file formats.
- Working on Architectural design for creating a Data Hub which will store all the available policy, claims and quotes data within Confidential .
- Fetched data from oracle using sqoop for testing Hadoop tools on real data.
- Developed Automation testing framework using java and shell script to test out the cluster upon every Cloudera upgrade. Used HBase, Sqoop, Hive, Pig, Impala, Sqoop and Spark using Scala.
- Written Json schema parser in Java to create hive ddl directly from Json schema.
- Written Map Reduce Program to validate Json data with Json schema.
- Written Map Reduce Program to convert xml to Json using Eclipse Link.
- Working on architectural design to migrate existing batch operations for various applications in Hadoop.
- Written several java programs and shell scripts using named pipes to ingest 100 million (10 TB) of synthetic policy data in hdfs for our Poc.
- Modified the existing open source Java Json Serde to accommodate our needs.
- Tested several file formats with various sizes of data sets in hive, pig and Impala to see their performance.
- Did POC to compare query performance on various types of Hadoop tools such as Hive, Impala and Pig.
- Did POC on integrating EMC ISILON with Confidential Hadoop Clusters.
- Did POC to integrate Tableau and BO with Cloudera Hadoop.
- Did POC to identify best ETL tools for Hadoop such as Informatica and Ab-Initio.
- Doing research on identifying Spark and Phoenix as a possible solution for our projects so that it can be used in projects for fast querying.
Environment: Hadoop, Cloudera Hadoop CDH4.6 to CDH5.3, HDFS, Map Reduce, JDBC, Hive, PIG, Impala, Eclipse, JAVA, PL/SQL, DB2, UNIX Shell Scripting, SQL.
Confidential - Newark, DE
Responsibilities:
- Used Eclipse to write Java and MapReduce programs to help on several analytics queries.
- Responsible for release of project and managing team of 3 to implement the task efficiently and on time.
- Worked on Hadoop MapReduce and hive to load historical data.
- Written several hive queries to determine the frequently used channels for the customers so that it can identify which customers is more proactive on which channels.
- Implemented Partitioning, Dynamic Partitions, Static Partitions and Buckets in HIVE.
- Created several internal and external tables to load the data and analyze the data.
- Exported the result set from HIVE to Db2 using Shell scripts.
- Develop HIVE queries for the analysis.
- Helped the team to increase Cluster from 25 Nodes to 56 Nodes.
- Created documentation procedures for Confidential implementation
- Analyzing SQL queries in Confidential utilizing the EXPLAIN function.
- Understanding of Epoch, LGE, AHM in Confidential .
- Integration of Hadoop HDFS with Confidential .
- Written java programs to retrieve data from Confidential and feed it to our tablet application.
- Written Confidential copy command using shell script to copy data from flat file to Confidential .
- Created manual ETL process to load data in Hadoop and Confidential from oracle and Confidential .
- Helped in feeding the results of analytics data from Confidential to the bctab app.
- Worked as ETL Architect to make sure all the applications are migrated (along with server)
- Created TPT scripts ( Confidential Parallel Transfer scripts) to transfer data from Confidential to Hadoop for analytics purpose.
- Deep understanding and related experience with Hadoop/HDFS - internals and HDFS, Hive, Map/Reduce.
- Deep understanding of schedulers, workload management, availability, scalability and Distributed data platforms.
- Created several java classes needed to write algorithms using transactional data to help bank application to identify the top retailers and food chains which Confidential customers visit frequently.
- This helps in providing appropriate rewards and deals to the customers based on their transaction history.
Environment: Hadoop, Confidential, HDFS, MapReduce, JDBC, Hive, V-SQL, Eclipse, JAVA, PL/SQL, ORACLE, DB2,UNIX Shell Scripting, SQL, TOAD, Transact-SQL, Confidential, Autosys scheduler, TPT.
Data Mining/Analyst
Confidential - Newark, DE
Responsibilities:
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Written several Java program to retrieve data from Confidential and feed it to our db2 application.
- Involved in transforming complex business logic used in COBOL to be translated in Confidential Sql.
- Involved in writing scripts to transfer the raw data from db2 to Confidential using export command and copy command in db2 and Confidential respectively.
- Good working knowledge of Confidential DB architecture, column orientation, High Availability and Recovery.
- Good working knowledge of MPP, clustering, compression and continuous performance.
- Database design, partition, segmentation, schemas, tables, Projections and cluster management
- Import and export Confidential tables between databases.
- Perform cluster, Confidential resource management with the Management Console.
- Experience in working with a diverse teams in developing big data solutions
- Resolve client technical issues submitted to the technical support team
- Experience working closely with Data Scientists on Big Data Platform
- Created manual ETL process to load data in Hadoop and Confidential from DB2 and Confidential .
- Involved in creating proper unit test bed to perform the testing of migrated code from COBOL to V-SQL.
- Expert knowledge developing and debugging in Java.
- Involved in comparison of new data sets (through SAS scripting) getting generated from V-Sql with the data sets from original source so that all the differences are removed completely.
- Involved in automating the end to end process using autosys scheduler. Automation was done in each and every step of ETL (Extract Transform and Load).
- Worked on Hadoop MapReduce and hive to load historical data.
Environment: Confidential, DB2, SAS, UNIX Shell Scripting, V-SQL, JAVA, JDBC, PL/SQL, ORACLE, TOAD, Autosys scheduler, Hadoop.
Big Data Design and Application Developer
Confidential
Responsibilities:
- Written several core java programs to manage these procedures using callable statements.
- Also written several programs to retrieve and read data from the array object by using Collection framework in Java.
- Worked on generating stub for web Services.
- Used ANT for debugging and unit testing the code.
- Helped in designing database so that it follows RDBMS standards.
- Helped in writing several stored procedures and functions that were used to fetch data from backend.
- Extract, Transport and Load data between different DW architectures
- Implementation of Confidential database warehouse cluster.
- Perform DB monitoring, backup and restore, and performance tuning.
- Perform complex SQL queries using the V-SQL.
- Created manual ETL process to load Historical data in Hadoop and analytical data in Confidential from DB2, ORACLE and Confidential .
Environment: DB2, SAS, Java, UNIX Shell Scripting, Confidential, V-SQL HADOOP, JDBC, Oracle SQL, PL/SQL, TOAD,Hibernate,Spring batch
SQL Developer
Confidential
Responsibilities:
- Design of the application database.
- Responsible for creating Multi-Threading Environment in the system and using collections and maps for scalability in system.
- Development including Crystal Reports, writing Stored Procedures and the U.I for creating screen, coding and unit testing.
- Written several stored procedures and functions to implement new functionality.
- Unit Testing and System Testing.
- Responsible for improving the system to become more scalable and allow multiple users to use the same areas at the same time.
- Responsible for creating Multi-Threading Environment in the system.
- Responsible for using collections and maps for scalability in system.
- Responsible for optimizing the long running queries and making the system faster.
- Unit Testing and System Testing.
- Responsible for creating Multi-Threading Environment code in the system.
- Responsible for using collections and maps for scalability in system.
- Responsible for implementing the business functionality in Java and for creating various complex stored procedures and tables in MS SQL Server.
- Responsible for creating webpages in JSP and backend in java using island framework.
- Responsible for creating complex stored procedures.
- Responsible for designing database.
- Unit Testing and System Testing
Environment: Object Oriented Methodology, C++, Business Objects, SQL server, T-SQL.
