Hadoop/spark Developer. Resume
VA
SUMMARY:
- Over 8+years of extensive IT experience in Development, Support, Implementation of software applications which includesexperience in Big Data using Hadoop, Hive 0.8.1.3, Impala 2.1.0, PIG 0.14.0, Sqoop, Flume, HBase, Oozie, MapReduce 2.7.0, Spark 2.2.1, Scala 2.6.
- Solid Mathematical, Probability & Statistics foundation and broad practical statistical and Datamining techniques through various Industry work and academic programs.
- Strong working experience with Spark Core, Spark SQL, Spark Streaming and its core abstraction layer API’s like RDD, Data frame/Dataset, DStreams.
- Experience in developing some deliverable documentations including Data Flow, Use Cases, and Business rules.
- Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of claim data to detect and separate fraudulent claims.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Used spark as well as spark SQL API for pre - processing and analytics of data.
- Integrated hive with the spark to load hive table data into data frame to perform the transformation.
- Developed frameworks to read/write data from Redshift.
- Developed framework in spark/scala to ingest csv, tsv files into HDFS using data bricks csv reader.
- Done unit testing of spark using Spark-Testing-Base. Deployed using yarn and oozie actions using spark cluster mode.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Strong knowledge in Java Object Oriented Programing Concepts, Multi-Threading, Exception Handling, Collection Framework and J2EE Technologies.
- Expertise in writing Sqoop jobs to migrate huge amount of data from Relational Database Systems to HDFS/Hive Tables and vice-versa according to client's requirement.
- Worked extensively with Data migration, Data cleansing, Data profiling and ETL Processes features for data warehouses.
- Capable of processing large sets of Structured, Semi-Structured and Unstructured Data and supporting systems application architecture.
- Good understanding of NOSQL databases like HBase, Cassandra and its integration with Hadoop cluster to store and Retrieve huge amount of data.
- Experience in designing and developing POCs in Spark using Scala/Java to compare the performance of Spark with Hive and RDBMS.
- Strong working Experience in analyzing large datasets to find patterns and insights within Structured, Semi-Structure and Unstructured data.
- Experience in Extraction, Transformation & Loading(ETL) of data with different file formats like CSV, Text files, Sequence files, Avro, Parquet, JSON, ORC based on business needs and used file compression codecs like gzip, Snappy for better performance.
- Hands on Experience in writing complex SQL queries using relational databases like Oracle, SQL Server, MySQL.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
TECHNICAL SKILLS:
Hardware / Platforms: RedHat, CentOS, Unix/Linux 6.7, Windows7,8,10.
Big Data Ecosystems: MapReduce 2.7.0, HDFS, Yarn, Hive 0.8.1.3, Impala 2.1.0, Pig 0.14.0, Sqoop, Spark 2.2
Programming Languages: Java 1.8, Scala 2.6, Python 3.4.2, Shell Scripting.
Databases: MySQL, Oracle 12.1.0.1, SQL Server, HBase, Cassandra.
Hadoop Distributions: Cloudera and Hortonworks.
Cloud Services: Amazon (EC2, EMR, S3, Redshift, CloudFront).
Repository: Git, SVN.
Build& Deploy Tools: Maven, SBT, Gradle.
IDE Tools: Eclipse, IntelliJ Idea.
Reporting Tools: Tableau, Arcadia.
PROFESSIONAL EXPERIENCE:
Confidential, VA
Hadoop/Spark Developer.
Responsibilities:
- Collaborated on insights with other Data Scientists and Business analysts.
- Worked on a live 60 nodes Hadoop cluster running Cloudera Distribution including Apache Hadoop.
- Designed and implemented scalable infrastructure for large amount of data ingestion, aggregation, analytics in Spark.
- Developed Sqoop jobs to import data from Oracle Database by handling incremental data load on the customer transaction data by date.
- Used spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
- Designed and created Hive external tables using shared meta store with partitioning and Bucketing.
- Implemented Spark using Scala and utilizing Dataframes, Spark SQL API for faster processing of data.
- Used spark for interactive queries, processing of streaming data and integration with Hadoop Distributed file system for huge volume of data.
- Worked with various file formats like Parquet, JSON, CSV and compression formats like Snappy.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Spark.
- Worked with Spark to create structured data from the pool of unstructured data received in HDFS.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins.
- Developed Spark jobs to create dataframes from source system, process and analyze the data in Dataframes based on business requirements.
- Involved in converting MapReduce programs into spark transformations using Spark RDD in Scala.
- Experienced in performance tuning of Spark Applications by using Data Serialization techniques, correct level of Parallelism and memory tuning.
- Monitoring and Debugging Spark jobs which are running on a spark cluster using Cloudera Manager.
Environment: Java 1.8, Scala 2.6, Shell Scripting, Hadoop Yarn, Spark 2.2.1, LDAP, HDFS, Sqoop, Hive, Impala, Hue, Kafka, Cloudera Manager, Oracle, JIRA, Arcadia Enterprise.
Confidential, Philadelphia, PA
Hadoop Developer
Responsibilities:
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, HDFS, Hive and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in migrating existing Hadoop date lake to AWS EMR, uploading data into S3 bucked based on the specific vpc.
- Involved in performing end to end testing on EMR using python.
- Also involved in running each module using python scripts on EMR before performing integration testing.
- Responsible for developing data pipeline using Sqoop, MapReduce and Hive to extract data from various sources and store the result for downstream consumption.
- Worked with source load teams to perform loads while ingestion jobs are in progress.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Very good understanding of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Used hive to do transformations, event joins and some pre-aggregations before storing data into HDFS.
- Written MapReduce jobs to parse the logs and structure them in tabular format to facilitate effective query on log data.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Developed MapReduce jobs in java for data cleaning and preprocessing.
- Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
Environment: Java, HDFS, Scala 2.6, Python 3.4.2, Map Reduce, Flume, Pig, Sqoop, Hive, MySQL, Shell Scripting, Oozie, Cloudera Manager.
Confidential, Denver
Hadoop Developer.
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Develop different components of system like Hadoop process that involves Map Reduce, and Hive.
- Developed interface for validating incoming data into HDFS before kicking off Hadoop process.
- Written hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
- Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
- Experience working on multiple node cluster tool which offer several commands to return HBase usage.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience on pre-processing the logs and semi structured content stored on HDFS using PIG.
- Experience in structured data imports and exports into Hive warehouse which enables business analysts to write Hive queries.
- Experience in managing and reviewing Hadoop log files.
- Experience on Unix shell scripts for business process and loading data from different interfaces to HDFS.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
- Hands on experience in eclipse, Putty, WinSCP, VNCviewer, etc.
Environment: Linux 6.7, CDH5.5.2, MapReduce, Hive 1.1, PIG, HBase, Shell Script, SQOOP 1.4.3, Eclipse, Java 1.8.
Confidential
Java Developer.
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
- Analyze the software requirements to determine feasibility of design within time and cost constrains.
- Designed and Developed User Interactive (UI) of Web pages with the help of HTML, CSS3, CSS, JavaScript, jQuery, Bootstrap and AJAX.
- Involved in the creation of interface to manage user menu and Bulk update of Attributes using Angular.JS, node. Js, EXTJS, Require.js and jQuery.
- Developed the Controller, Service layer using Spring MVC & Spring JDBC.
- Implemented Restful Web services with spring and Angular.js.
- Configured the Transaction Management for the project using Spring Container Managed Transactions.
- Creating Custom directives and dependency injection.
- Applied SQL commands and Stored Procedures to retrieve data from Oracle 11g database.
- Used Hibernate ORM Framework to communicate with Oracle 11g database.
- Performed Unit testing on angular applications using tools like Karma, Jasmine.
- Involved in developing XML, HTML, and JavaScript for client-side presentation and, data validation on the client side with in the forms.
- Implemented CSS3 and JavaScript based navigation system visually identical to previous table-based system
Environment: Java, JDK1.8, Apache Tomcat-7, JavaScript, JSP, JDBC, Servlets, MS SQL, XML, Windows XP, Ant, SQL Server database, Eclipse.
Confidential, Milwaukee, WI
Software Engineer
Responsibilities:
- Developed utility to match the inputs & provide match grade.
- Developed customized way of loading the custom data into custom specific table and build indexes.
- Developed new JSP’s to receive input and display output of new modules.
- Implemented the validations using Front-end JavaScript
- Written JNDI Template prepared statements and callable statements for various database update, insert, delete operations and for invoking functions, stored procedures.
- Created JUNITs for Struts application and web services to test the new functionalities.
- Build scripts were written to create a run time environment using ANT.
- Code compilation using PMD, check style to ensure the code compliance.
Environment: Java, Struts 1.1, Jax rpc, Oracle, MYSQL, SQL, Junits, Singleton, JNDI, Log4J, ANT, SVN.
