We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

5.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • Big data Developer with over 7+ years of ProfessionalIT industry experience comprising of build release management, software configuration, design, development and cloud implementation.
  • Hands on experience in Migrating the data from Legacy systems to various cloud like AWS, AZURE, GCP.
  • Hands on experience on various Azure services like, Blob storage, cosmos DB, Azure Data bricks, Azure Data factory, Event HUB, and Azure data lake
  • Over 3 years of experience on AWS services along with wide and in depth understanding of each one of them.
  • Hands on experience on GCP, GCP bucket, G - cloud function, cloud dataflow, Data Proc, PUB/SUB Cloud shell, Stack Driver.
  • Good knowledge in understanding the concepts of Partitions, bucketing in Hive, and designed both managed and External tables in Hive to optimize performance.
  • Cognitive about designing, deploying, and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS).
  • Experienced with event-driven and scheduledAWSLambda functions to trigger variousAWS resources.
  • Experience working with Azure Databricks.
  • Hands on experience in writing map reduce jobs in Hadoop ecosystems including hive, pig.
  • Experienced with installation ofAWSCLI to control variousAWSservices through SHELL/BASH scripting.
  • Extensively worked on Apache NIFI to design, develop the data pipelines to process large set of data and configured Lookup’s for Data Validation and Integrity in HDFS and AWS cloud.
  • Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflake’s Snow SQL.

TECHNICAL SKILLS:

Hadoop/BigData Technologies: Hadoop, Apache Spark, HDFS, Map Reduce, Sqoop, Hive, Oozie, Zookeeper, Cloudera Manager, Kafka, Flume

Programming & Scripting: Python, Scala, SQL, Shell Scripting

Databases: MY SQL, Oracle Exastack, PostgreSQL, MS-SQL Server, Teradata

NO SQL Database: HBase, Cassandra, Dynamo DB, Mongo DB, Cosmos DB

Hadoop Distribution: Horton Works, Cloudera, Spark

Version Control: Git, Bitbucket, SVN

Operating Systems: Linux, Unix, Mac OS-X, CentOS, Windows 10, Windows 8, Windows 7

Cloud Computing: AWS, Azure, GCP

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Big data Developer

Responsibilities:

  • Develop and analyze large scale, high speed and low latency data solutions using Big Data, Apache Hive, Apache Spark and Scala.
  • Build Spark Scala application in Azure Databricks to read the data from Azure Cosmos Db using Change feed and write it to Delta table.
  • Built a generic framework to ingest the CSV, JSON, Parquet data from Blob storage to Delta tables.
  • Created pipelines in Azure Data Factory using Linked Services/Datasets/pipelines to extract the data from various sources like Azure SQL, Azure blob storage, Cosmos DB, Azure SQL Data Warehouse.
  • Extensively used Databricks note to read the streaming data from Event Hub and write it to delta tables.
  • Extensively used data bricks CLI to copy the files to DBFS from local system.
  • Implement Java Spring boot application to consume real time messages from Kafka and write it to Postgres DB and Mainframe DB2 tables also produce the messages to different topics.
  • In-depth understanding of Kafka architecture and performance tuning techniques.
  • Worked on improving the performance of spring boot application by configuration Kafka partitions, pool size, Concurrency etc.,
  • Integrated the spark and spring boot applications with Splunk for logging purpose.
  • Designed and developed the framework for the Insurance claim’s systems in Pyspark to read the data various source systems like Postgres, DB2, HBase and generate files based on the business requirements.
  • Worked on complex logics and implemented in the Pyspark, created test cases for the scenario’s developed.
  • Using Spark connected to various databases like Postgres DB, DB2 to read and write the data.
  • Implement the Jenkins scripts and jobs to push the code from git to the Kubernetes.
  • Automate the deployment of jobs to create images in Docker and run it on Kubernetes cloud environment.
  • Analysis of business requirements by understanding the existing process and prepare design documents for applications.

Environment: Azure Blob storage, Azure Data Bricks, Azure Data factory, Cosmos DB, Event HUB, Data bricks CLI, My SQL, Hive, Tear data, Cloud SQL.

Confidential

ETL Developer

Responsibilities:

  • Designed and built terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Redshift for large scale data handling Millions of records
  • Developed workflows in Oozie for business requirements to extract the data using Sqoop
  • For data exploration stage used Hive to get important insights about the processed data from HDFS
  • Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB
  • Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done
  • Responsible for ETL and data validation using SQL Server Integration Services
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift
  • Analyzed large data sets using pandas to identify different trends/patterns about data
  • Utilized regression models using SciPy to predict future data and visualized them
  • Managed large datasets using Panda’s data frames and MySQL for analysis purposes
  • Developed schemas to handle reporting requirements using Tableau

Environment: Python, Hadoop, Map Reduce, Hive QL, Hive, HBase, Sqoop, Cassandra, Flume, Tableau, Impala, Oozie, MYSQL, Oracle SQL, Pig Latin, AWS, NumPy.

Confidential

Data Analytics Engineer

Responsibilities:

  • Chaired a team of 2 people for telecom inventory maintenance of 5+ customer using SQL server and achieved historic annual savings of 8% with service providers AT&T, Verizon, Granite Telecom, CenturyLink.
  • Collaborated with cross functional operations teams to gather, organize, customize and analyze product related data to maintain inventory of customers worth around $10M- $40M
  • Used SQL on data sets to provide ad-hoc data requests and created a report metrics which accelerated reporting by 3%
  • Developed SQL queries of 8% efficiency to fetch required data, analyzed expense based on requirements
  • Analyzed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Built Jupiter notebooks using PySpark for extensive data analysis and exploration.
  • Implemented code coverage and integrations using Sonar for improving code testability.
  • Pushed application logs and data streams logs to Applications Insights for monitoring and alerting purpose.
  • Used Oracle exastack for architecture.
  • Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
  • Experience designing solutions in Azure tools like Azure Data Factory, Azure Data Lake, SQL DWH, Azure SQL & Azure SQL Data Warehouse, Azure Functions.
  • Migrated existing processes and data from our on-premises SQL Server and other environments to Azure Data Lake.
  • Used Azure Databricks for fast, easy and collaborative spark-based platform on Azure.
  • Used Databricks to integrate easily with the whole Microsoft stack.
  • Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark.
  • Implemented various optimization techniques for Spark applications for improving performance.
  • Developed Jenkins and Ansible pipelines for continuous integration and deployment purpose.
  • Built SFTP integrations using various Azure Data Factory solutions for external vendors on boarding.
  • Designed and implemented data profiling and data quality improvement solution to analyze, match, cleanse, and consolidate data before loading into data warehouse
  • Utilization of Power BI functions and Pivot Tables to further analyze in given complex data
  • Extracted data from CRM systems to Staging Area and loaded the data to the target database by ETL process using Informatica Power Center
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior

Environment: MS SQL Server, Tableau, Power BI, MS office, SSRS, SSIS, SAS, PL/SQL,Azure.

Confidential

Hadoop Developer

Responsibilities:

  • Involved with the application teams to install Hadoop updates, patches and version upgrades as required.
  • Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
  • Created HBase tables to store variable data formats of data coming from different portfolios.
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
  • Experience working on installing and configuration of windows active directory.
  • Strong Knowledge on HDFS, MapReduce and NoSQL Database like HBase.
  • Experience in client side Technologies such as HTML, CSS, JavaScript, jQuery

We'd love your feedback!