- Overall 8+ years of IT experience across Java, SQL, ETL, Big Data. Interested and passionate about working in Big Data environment. 4+ years of experience in Big Data, Hadoop, No SQL technologies in various fields like Insurance, Finance, Health Care.
- Vast knowledge on the Hadoop Architecture and functioning of various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, Map reduce, Spark.
- Extensive of experience in providing solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, Kafka, Pig, Hive, Sqoop, HBase, Cloudera Manager, Hortonworks, Zookeeper, Oozie, Hue.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa. Skilled in Data migration and data generation in Big Data ecosystem.
- Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Hbase).
- Implementation of Big data batch processes using Hadoop, Map Reduce, YARN, Pig and Hive.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
- Hands on experience in in-memory data processing with Apache Spark using Scala and python codes.
- Worked with Spark on an EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3.
- Responsible for interaction with the clients for understanding their business problem related to BigData, Cloud Computing and NoSQL technologies.
- Experienced in using Kafka as a distributed publisher-subscriber messaging system.
- Good experience in writing Pig scripts and Hive Queries for processing and analyzing large volumes of data.
- Developed Spark scripts by using Scala shell commands as per the requirement
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver best results.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experienced in Strong scripting skills in Python and Unix shell.
- Experience in managing and reviewing Hadoop log files.
- Hands on experience in application development using RDBMS and Linux shell scripting.Having good working experience in Agile/Scrum methodologies, technical discussion with client
- Communication using scrum calls daily for project analysis specs and development aspects.
- Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.
Big Data/Hadoop ecosystem: HDFS, MapReduce, YARN, Apache NiFi, Hive, Pig, HBase, ImpalaZookeeper, Sqoop, Flume, Oozie, Spark. Apache Phoenix, Zeppelin, EMR.
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Python, Linux shell scripts.
Methodologies: Agile, Scrum, Waterfall
NoSQL Database: Hbase, Cassandra, MongoDB
Database: Oracle 10g, Teradata, DB2, MS Azure.
Tools Used: Eclipse, IntelliJ, GIT, Putty, WinSCP
Operating systems: Windows, Unix, Linux, Ubuntu
Confidential, Princeton Junction, NJ.
Spark & NiFi Developer
Roles & Responsibilities:
- Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Scala, NiFi etc.
- Designed and Implemented Spark Jobs to be deployed and run on existing Active clusters.
- Configured Postgres Database on EC2 instances and made sure application that was created is up and running, Trouble Shooted issues to meet the desired application state.
- Worked on creating and configuring secure VPC, Subnets, and Security Groups through private and public networks.
- Created alarms, alerts, notifications for Spark Jobs to email and slack group message job status and log in CloudWatch.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Worked on generation large set of test data with data integrity using java which used in Development and QA Phase.
- Worked in Spark Scala, improving the performance and optimized of the existing applications running on EMR cluster.
- Worked on a Spark Job to Convert CSV data to Custom HL7/FHIR objects using FHIR API’s.
- Deployed SNS, SQS, Lambda function, IAM Roles, Custom Policies, EMR with Spark and Hadoop setup and bootstrap scripts to setup additional software’s needed to perform the job in QA and Production Environment using Terraform Scripts.
- Worked on Spark Job to perform Change Data Capture (CDC) on Postgres Tables and updated target tables using JDBC properties.
- Worked on Kafka Publisher integrated in spark job to capture errors from Spark Application and push into Postgres table.
- Worked extensively on building Nifi data pipelines in docker container environment in development phase.
- Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments.
Confidential, New York City, New York
Big Data Developer
Roles & Responsibilities:
- Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
- Perform validations and consolidations for the imported data, Data Migration and Data Generation.
- Ingested data sets from different DataBases and Servers using Sqoop, Talend Import tool and MFT (Managed file transfer) Inbound process with elastic search.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala and Java.
- Developed Spark scripts by using Scala shell commands, Java as per the requirement.
- Using Spark streaming consumes topics from distributed messaging source Talend, Kafka and periodically pushes batch of data to Spark for real time processing in elastic search.
- Involved in teams to analyze the Anomaly detection and ratings of the data using ETL tool Talend.
- Developed Complex HiveQL‘s using SerDe JSON.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in importing the real time data using elastic search to Hadoop using Kafka, Talend and implemented the Oozie job for daily imports
- Wrote Pig Latin Scripts and Hive Queries using Avro schemas to transform the Data sets in HDFS.
- As part of support, responsible for troubleshooting of Map Reduce Jobs, Java Scripts, Pig Jobs, Hive
- Worked on performance tuning of Hive & Pig Jobs.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data in Amazon EMR.
Environment:: Hadoop, Cloudera, Map Reduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Talend, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, Spring, JIRA, EMR.
Confidential, Tampa, Florida
Hadoop and Spark Developer
Roles & Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing.
- Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
- Installed and configured Hadoop Cluster for major Hadoop distributions.
- Used Hive, Pig and Talend as an ETL tools for event joins, filters, transformations and pre-aggregations.
- Created partitions, bucketing across state in Hive to handle structured data using Elastic search.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
- Involved in moving all log files generated from various sources to HDFS for further processing through Elastic search, Kafka, Flume & Talend and process the files by using Piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
- Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
- Used SparkSQL to read and write table which are stored in Hive and Amazon EMR.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
- Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
- Captured data logs from web server and Elastic search into HDFS using Flume for analysis.
- Managed and reviewed Hadoop log files.
Environment: Hive, Pig, MapReduce, Apache Nifi, Sqoop, Oozie, Flume, Kafka, Talend, EMR, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.
Confidential, Boston, MA
Roles & Responsibilities:
- As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilising Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop, elastic search etc.
- Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion.
- Hands-on experience with Confidential Big Data product offerings such as Confidential Info Sphere Big Insights, Confidential Info Sphere Streams, Confidential BigSQL.
- Load and transform large sets of structured, semi-structured using Hive and Impala with elastic search.
- Worked on Assert Tracking project where we use to collect real-time vehicle location data using Confidential streams from JMS queue and processed that data in Vehicle Tracking using ESRI - GIS Mapping Software, Scala and Akka Actor Model.
- Experienced in Developing Hive queries in BigSQL Client for various use cases.
- Involved in developing few Shell Scripts and automated them using CRON job scheduler
Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, Confidential Info Sphere Big Insights, Confidential Info Sphere Streams, Confidential BigSQL, Java.
Confidential - Chicago, IL
Role: Hadoop/Spark Consultant
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala, extracted large datasets from Data Lakes, Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
- Experience in managing AWS Hadoop clusters and services using Horton Works Manager
- Explore with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Develop Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experience over Kafka and Storm are used for real time analytics and AML, which used for data analytics.
Environment: Hadoop, Hortonworks, MapReduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, Spring, JIRA.
Confidential - Dallas, Texas
- Created Data Lakes and data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location. Good experience on Apache Nifi Ecosystem.
- Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy. Nifi ecosystem is used.
- Data warehousing tools like Talend and Teradata are used.
- Developed impala scripts for end user / analyst requirements for adhoc analysis
Used various Hive optimization techniques like partitioning, bucketing and Mapjoin.
- Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location.
- Developed UDF's in spark to capture values of a key-value pair in encoded Json string.
- Developed spark application for filtering Json source data in AWS S3 location and store it into HDFs with partitions and used spark to extract schema of Json files.
Environment: Hive, Apache Nifi, Spark, AWS S3, EMR, Cloudera, Jenkins, Shell scripting, Hbase, Airflow, Intellij IDEA, Sqoop, Impala.
Junior Java Developer/ Software Intern
Roles & Responsibilities:
- Used Eclipse 6.0 as IDE for application development.
- Used JDBC to connect the web applications to Databases.
- Developed and utilised J2EE Services and JMS components for messaging communication in Web Logic.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the data from the MySql and Oracle Databases into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in creating Hive(HCatalog) tables, loading and analyzing data using hive queries.
- Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
- Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.