We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

5.00/5 (Submit Your Rating)

Jersey City, NJ

SUMMARY

  • Overall 9+ years of experience in leveraging big data tools also certified in AWS
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
  • Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
  • Good understanding of designing attractive data visualization dashboards using Tableau.
  • Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Create batch data by using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Used Flume and Kafka to direct data from different sources to/from HDFS.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
  • Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
  • Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in working with Linux/Unix and shell commands on the Terminal.
  • Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Ability to develop MapReduce program using Java and Python.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
  • Good understanding and exposure to Python programming.
  • Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls and worked with different distributions of Hadoop like Hortonworks and Cloudera.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Services: Amazon AWS, EC2, Redshift, Docker, Kubernetes, AWS ECS, Terraform, AWS Cloud Formation, AWS Cloud Watch, X-ray, AWS Cloud Trail.

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential - Jersey City, NJ

Sr. Big Data Developer

Responsibilities:

  • Working as a Sr. Big Data Developer with Big data & Hadoop Ecosystems components.
  • Worked in Agile development environment and Participated in daily scrum and other design related meetings.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Designed and developed various modules of the application with frameworks like Spring MVC, Web Flow, architecture and Spring Bean Factory using Spring boot, IOC and AOP concepts.
  • Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Designed the front-end application and user interactive web pages using web technologies like Angular.js in conjunction with Bootstrap to render web pages responsive.
  • Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD.
  • Created FTP scripts for sqoop data from DB2 and save in AWS as Avro formatting.
  • Developed Nifi flow to move data from different sources to HDFS and from HDFS to S3 buckets
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to HBase.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Used AWS infrastructure to host the portal. Used EC2, RDS, and S3 features of AWS.
  • Experienced in bringing up EMR cluster and deploying code into the cluster in S3 buckets.
  • Migrated the existing on-perm code to AWS EMR cluster.
  • Implemented Spark to migrate MapReduce jobs into Spark RDD transformations, streaming data using Spark streaming
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data.
  • Worked with teams in setting up AWS EC2 instances by using different AWS services.
  • Extensively worked with Cloudera Hadoop distribution components and custom packages.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools.
  • Implemented AWS Redshift (a petabyte-scale data warehouse service in the cloud).
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Analyzed the data by performing Hive queries (HiveQL), Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.

Environment: Hadoop 3.0, Agile, Hive 2.3, SQL, Spark, Python, AWS, MVC, NoSQL, HBase 2.1, Hortonworks, XML

Confidential - Peoria IL

Big Data Developer

Responsibilities:

  • Worked as a Big Data Developer for providing solutions for big data problem.
  • Responsible for automating build processes towards CI/CD automation goals.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Configured Sqoop and developed scripts to extract data from MYSQL into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
  • Built and deployed Docker containers to improve developer workflow, increasing scalability and optimization.
  • Used AWS Cloud Trail for audit findings and Cloud Watch for monitoring AWS resources
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Involved in scheduling Oozie workflow to automatically update the firewall.
  • Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Developed Nifi flows dealing with various kinds of data formats.
  • Implemented test scripts to support test driven development and continuous integration.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Responsible for managing data coming from different sources
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Spark streaming application to pull data from cloud to Hive table.

Environment: Agile, Hadoop 3.0, HBase 2.1, CI, CD, Sqoop, Hive 1.9, AWS, HDFS, Scala, Spark, S3, CSV

Confidential - Menlo Park, CA

AWS Developer

Responsibilities:

  • Worked on multiple AWS accounts with different VPC's for Prod and Non-Prod where key objective included automation, build out, integration and cost control.
  • Amazon EC2 Cloud Instances using Amazon Web Services (Linux/Centos/Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Assisted in migrating the existing data center into the AWS environment.
  • Worked on AWS including EC2, Auto-Scaling in launching EC2 instances, Elastic Load Balancer, Elastic Beanstalk, S3, Glacier, Cloud Front, RDS, VPC, Cloud Watch, Cloud Formation, EMR, IAM and SNS.
  • Created S3 buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS.
  • Utilized kubernetes and docker for the runtime environment of the CI/CD system to build, test deploys.
  • Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes.
  • Designed and implemented by configuring Topics in new Kafka cluster.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Used security groups, network ACLs, Internet Gateways, NAT instances and Route tables to ensure a secure zone for organizations in AWS public cloud.
  • Responsible for build and deployment automation using VM WareESX, Docker, Kubernetes containers and Ansible.
  • Used Ansible and Ansible Tower as Configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages change.
  • Used Ansible Playbooks to setup Continuous Delivery Pipeline. Deployed micro services, including provisioning AWS environments using Ansible Playbooks.
  • Created scripts in Python which integrated with Amazon API to control instance operations.
  • Worked on Administration and Architecture of Cloud platforms.
  • Maintained the monitoring and alerting of production and corporate servers using Cloud Watch service.
  • Migrated applications from internal data center to AWS.
  • Deployed and configured Git repositories with branching, tagging, and notifications. Experienced and proficient in deploying and administering GitHub.

Environment: AWS, EC2, CI, CD, S3, Docker, Kafka, Jenkins, Python

Confidential - McLean, VA

Python Developer

Responsibilities:

  • Developed frontend and backend modules using Python on Django including Tasty Pie Web Framework using Git.
  • Implemented SQL Alchemy which is a python library for complete access over SQL.
  • Developed views and templates with Python and Django's view controller and templating language to created user-friendly website interface.
  • Used Pandas library for statistical Analysis. Panda's library was used for flexible reshaping and pivoting of data sets.
  • Developed cross-browser/platform with ReactJs, nodeJs, JQuery, AJAX and HTML/CSS to desired design specs for single page layout using code standards.
  • Created UI from scratch using ReactJs.
  • Developed the presentation layer using HTML, CSS, JavaScript, JQuery and AJAX. Utilized Python libraries.
  • Used Django configuration to manage URLs and application parameters.
  • Installed, configured, and managed the AWS server. AWS data pipeline for Data Extraction, Transformation and Loading from the homogeneous or heterogeneous data sources.
  • Accessed database objects using Django Database APIs. Worked on python-based test frameworks and test-driven development with automation tools.
  • Wrote and executed various SQL queries from python using Python-MYSQL connector and MYSQL DB package.
  • Performed front-end development for web initiatives to ensure usability, using HTML, CSS, Bootstrap, and JavaScript.
  • Developed the required XML Schema documents and implemented the framework for parsing XML documents.
  • Queried MySQL database queries from Python using-MySQL connector and MySQL DB package to retrieve information.
  • Designed and developed a system for data management using MySQL and queries were optimized to improve the performance.

Environment: Python, Django, SQL, ReactJs, nodeJs, JQuery, AJAX, HTML5, CSS3, JavaScript, JQuery, AJAX, AWS

Confidential - Houston, TX

Cloud Developer

Responsibilities:

  • Involved in the analysis, definition, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Worked as a senior design engineer, mainly on C++, STL, data structures, UNIX, multi threading.
  • Implemented AWS Sysops policies like Migrating On premise Database to Cloud.
  • Experienced in Performance Tuning and Query Optimization in AWS Redshift.
  • Provisioned and Managed Servers using AWS Lambda which is Server less Computing.
  • Designed, coded, and implemented automated build scripting in Ant, Jenkins/Hudson, and Maven.
  • Experience with Bitbucket, GIT in creating the Code repository, and pushing the Changes to the Code Repository and also resolving any Merge Conflicts.
  • Did programming in C on UNIX platform to contribute to the software project, which automated customized design process.
  • Experienced in writing the Python Code for Lambda Functions perform necessary logic and derive the values. Also, handy experience in writing the Python Unit test cases for checking the written logic.
  • Experienced in developing middleware components for software in C/C++ using STL, multithreading, data structures, IPC (TCP/IP socket programming), SNMP and design patterns.
  • Worked on Antiphony and tried to access AWS console to leverage the resources to the desired level.
  • Upgraded Jenkins to an existing application. Configuring LDAP Authentication with the existing software structure.
  • Experienced in creating Splunk Dashboard for Lambda alerts and also using Filter functions in Lambda.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch.
  • Responsible for end-to-end public Cloud Automation of application delivery, including Infrastructure provisioning and integration with Continuous Integration/Continuous Development (CI/CD) platforms.

Environment: AWS, Ant, Jenkins, UNIX, Python, C, CI, CD, C++

We'd love your feedback!