Big Data Developer Resume
Piscataway, NJ
SUMMARY
- A dynamic professional with 6+ years of diversified experience in teh field of Information Technology with an emphasis on Big Data/Hadoop EcoSystem implementation, maintenance, ETL and Big Data analysis operations.
- Hands on experience in Hadoop ecosystem components such as HDFS, MapReduce, Yarn, Pig, Hive, Hbase, Oozie, Sqoop, Kafka, Spark and Apache.
- Hands on experience in Importing and exporting data from different databases like MySQL, Oracle into HDFS using Sqoop.
- Hands - on experience architecting and implementing Hadoop clusters on Amazon (AWS), ability to adapt said noledge toGCPand Azure cloud platforms.using EMR, S2, S3, Redshift, Cassandra, AnangoDB, CosmosDB, SimpleDB, AmazonRDS, DynamoDB, Postgresql., SQL, MS SQL
- Experienced in Cluster maintenance and Commissioning /Decommissioning of Data Nodes and good understanding/noledge of Hadoop Architecture and various components such as HDFS, Job Tracker, and Task Tracker, NameNode, DataNode and MapReduce concepts.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting of Red hat Linux, SUSE, CentOS, AIX, Solaris.
- Experienced Schedule Recurring Hadoop Jobs with Apache Oozie and experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Used Zookeeper to provide coordination services to teh cluster.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive experience on integrating third party products using API, JDBC, Web Services.
- Developed and deployed real time and batch ETL using CI/CD pipeline for real-time data distribution, storage, and analytics.
- Strong experience in implementing Data warehousing applications using ETL/ELT tools Informatica, Snowflake.
- Hands on experience on AWS cloud services (Lambda, EC2, S3, RDS, Redshift, Data Pipeline, EMR, SageMaker, Glue). Experience in extracting data from AWS S3 and load it into data mart in Amazon Redshift.
- Worked in developing a Nifi flow prototype for data ingestion in HDFS.
- Worked on data ingestion ETL Pipeline on streaming and batch data from different data source to ingest data into DataLake using AWS Glue, AWS Lambda, AWS Kinesis, Firehose, S3.
- Has noledge on python decorators, Generators, collection modules.
- Developed RESTful Web Services using Spring framework.
- Excellent Communication skills, co-ordination, and communication with all stakeholders including business internal clients and experience working with cross-functional teams.
TECHNICAL SKILLS
BIG Data Ecosystem: HDFS, MapReduce, Hive, Sqoop, HBase, Presto, Zookeeper, Oozie, UC4, Kafka, Spark, Pig, Nifi
Cloud Technologies: Amazon Web Services, Amazon RedShift
AWS Services: Glue, Lambda, Athena, EC2, RDS, S3, CloudWatch, SNS, SQS, EMR, Kinesis
Databases: SQL Server, MySQL, Oracle, Hbase, DB2
CI/CD Tools: Jenkins, Terraform, Docker
Languages: SQL, JavaScript, Java, Hive (HQL), Python
Other Tools: Jira, Putty, WINSCP, EDI(Gentran), Stream weaver
Operating Systems: Win 7, 10, UNIX, Linux, Macintosh
PROFESSIONAL EXPERIENCE
Big data developer
Confidential, Piscataway, NJ
Responsibilities:
- Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions.
- Implement solutions for ingesting data from various sources and processing teh Datasets utilizing Big Data technologies such as Hadoop, Hive, Kafka, Map Reduce Frameworks and Hbase
- Developed Python code to gather teh data from HBase (Cornerstone) and designs teh solution to implement usingPySpark.
- Created teh PySpark programs to load teh data into Hive and MongoDB databases from PySpark Data frames
- Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
- Developed Python script for start a job and end a job smoothly for a UC4 workflow
- Involved in Migrating Objects from Teradata to Snowflake.
- Worked on Amazon Web service (AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake.
- Installed configured apache airflow for workflow management and created workflows in python.
- Analyze source systems like Oracle, RDBMS database tables, perform analysis, data modelling from source to target mapping and build data pipelines to ingest data into Hadoop as per teh business requirement.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3) and AWS Redshift.
- Specified nodes and performed teh data analysis queries on Amazon redshift clusters using AWS Athena on AWS.
- Design and develop real time data streaming solutions using Apache Kafka build data pipelines to store Big data-sets into NoSQL databases like Hbase.
- Exploring with teh Spark and improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, PySpark, Spark-SQL, Data Frame, and Pair RDD's
- Creation of Hbase key-space and tables by using map function to store JSON records.
- Develop Kafka Producers and Consumers from scratch as per teh business requirements.
- Extensively worked on Spark Scala to prepare data for building models which will be consumed by Data Science team.
- Develop data pipeline using Sqoop to ingest billing and order events data from Oracle tables into Hive tables.
- Create Flume source, sink agents to ingest log files from SFTP sever into HDFS for analysis.
- Create Hive Tables as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Develop end-to-end data processing pipelines dat begin with receiving data using distributed messaging systems Kafka through persistence of data into Hbase
- Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3
- Scheduling jobs using Oozie actions like Shell action, Spark action and Hive action.
Environment: Hadoop, Hortonworks, Linux, Hive, Scala,python, MapReduce, HDFS, Kafka, Spark, Cassandra, Shell Scripting, Sqoop, Maven, Spring Framework, snowflake,Jira, Oracle Database, AWS S3, EC2, Redshift, Azure
Spark/Hadoop developer/Big data
Confidential, Irving, TX
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive, Spark and Scala.
- Extensively used Hive, Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.
- Using g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket.
- Write a program to download a SQL Dump from there equipment maintenance site and then load it in GCS bucket. On teh other side load this SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load teh Data from MYSQL to Bigquery using Python, Scala, spark and Dataproc.
- Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
- Design Setup maintain Administraor teh Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse
- Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service.
- Analyze asset matrix (mapping document) and used to develop Spark - Scala projects in IntelliJ.
- Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake(ADLS), Azure Data Factory(ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service(AAS), Azure Blob Storage, Azure Search, Azure App Service,Azure data Platform Services.
- Involved in Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Maintain teh code base in Bit Bucket where teh production support team uses Jenkins for teh deployment.
- Working on Hortonworks Distributions Platform with AWS S3 and Redshift where teh targets tables are provided to BI team for their dashboards.
- Ingestion team uses Apache Ni-Fi to pull teh data from various data sources to data lake.
- Scala modules are built in IntelliJ where teh entire Spark Scala logic is developed to achieve business requirements.
- Used Oozie to schedule spark jobs. Also used to trigger spark jobs in both client and cluster mode in lower environments.
- Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.
- Developed shell scripts to generate teh hive create statements from teh data and load teh data into teh table.
Environment: Hadoop, HDFS, MapReduce, Hive, Spark, Sqoop, DB2, Spark, Scala,python, Big Data, Spark SQL, Kafka, Streaming, AWS, Azure,EC2
Hadoop developer
Confidential, michigan, MI
Responsibilities:
- Imported teh data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
- Worked on ingesting data from RDBMS sources like - Oracle, SQL Server into HDFS using Sqoop.
- Loaded all datasets into Hive from Source CSV files using spark. And loading to Cassandra DB using spark.
- Created UDF in spark using java according to teh business requirements and load teh data into Database.
- Developed Spark application for both batch process and streaming process.
- Developed Python scripts to clean teh raw data.
- Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic MapReduce
- Managed and reviewed teh Hadoop log files using Shell scripts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.
- Loaded all datasets into Hive from Source CSV files using Sqoop.
- Loading Data into Cassandra DB from Source CSV files using Spark/azu.
- Completed data extraction, aggregation and analysis in HDFS by using PySpark and store teh data needed to Hive.
- Designed and developed ETL jobs to extract data from AWS S3 and loaded it into data mart in Amazon Redshift.
Environment: Spark, Spark SQL, Spark Streaming, ETL, Scala,python, Java, Hadoop, HDFS, Hive, Sqoop, AWS, Shell Scripting, HBase, Jenkins, Splunk, MySQL.
Hadoop Developer
Confidential
Responsibilities:
- Developed MapReduce jobs for Log Analysis, Analytics and to generate reports for teh number of activities created in a day.
- Involved in implementing complex MapReduce programs to perform joins on teh Map side using distributed cache in Java.
- Implemented analytical algorithms using MapReduce programs to apply on HDFS data.
- Developed teh application with help of Struts Framework dat uses Model View Controller (MVC) architecture with JSP as teh view.
- Create common Java components between teh applications to convert data to appropriate state for teh applications.
- Using IBM RAD application tool and debug any Java issues while deploying or integrate with other Java applications.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing teh data with Pig and Hive.
- Wrote Java code for file writing and reading, extensive usage of data structure Array List and Hash Map.
- Wrote test cases which adhere to a Test-Driven Development (TDD) pattern.