We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

2.00/5 (Submit Your Rating)

Jersey City, NJ

SUMMARY

  • Overall 9+ years of experience in leveraging big data toolsalso certified in AWS
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
  • Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
  • Good understanding of designing attractive data visualization dashboards using Tableau.
  • Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Create batch data by using spark with the help of ScalaAPI in developing Data Ingestionpipelines using Kafka.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Used Flume and Kafka to direct data from different sources to/from HDFS.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
  • Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
  • Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in working with Linux/Unix and shell commands on the Terminal.
  • Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Ability to develop MapReduce program using Java and Python.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
  • Good understanding and exposure to Pythonprogramming.
  • Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls and worked with different distributions of Hadoop like Hortonworks and Cloudera.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Services: Amazon AWS, EC2, Redshift, Docker, Kubernetes, AWS ECS, Terraform, AWS Cloud Formation, AWS Cloud Watch, X-ray, AWS Cloud Trail.

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential - Jersey City, NJ

Sr. Big Data Developer

Responsibilities:

  • Working as a Sr. Big Data Developer with Big data & Hadoop Ecosystems components.
  • Worked in Agile development environment and Participated in daily scrum and other design related meetings.
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDD, Scala and Python.
  • Designed and developed various modules of the application with frameworks like Spring MVC, Web Flow, architecture and Spring Bean Factory using Spring boot, IOC and AOP concepts.
  • Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Designed the front-end application and user interactive web pages using web technologies like Angular.js in conjunction with Bootstrap to render web pages responsive.
  • Imported data from AWSS3 and into spark RDD and performed transformations and actions on RDD.
  • Created FTP scripts for sqoop data from DB2 and save in AWS as Avro formatting.
  • Developed Nififlow to move data from different sources to HDFS and from HDFS to S3 buckets
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using RESTAPI
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Sparkframework.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to HBase.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Used AWS infrastructure to host the portal. Used EC2, RDS, and S3 features of AWS.
  • Experienced in bringing up EMR cluster and deploying code into the cluster in S3 buckets.
  • Migrated the existing on-perm code to AWS EMR cluster.
  • Implemented Spark to migrate MapReduce jobs into Spark RDD transformations, streaming data using Spark streaming
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data.
  • Worked with teams in setting up AWS EC2 instances by using different AWS services.
  • Extensively worked with Cloudera Hadoop distribution components and custom packages.
  • Installed and Configured Apache Hadoop clusters for application development and Hadooptools.
  • Implemented AWS Redshift (a petabyte-scale data warehouse service in the cloud).
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Analyzed the data by performing Hive queries (HiveQL),Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.

Environment: Hadoop 3.0, Agile, Hive 2.3, SQL, Spark, Python, AWS, MVC, NoSQL, HBase 2.1, Hortonworks, XML

Confidential - Peoria IL

Big Data Developer

Responsibilities:

  • Worked as a Big Data Developer for providing solutions for big data problem.
  • Responsible for automating build processes towards CI/CD automation goals.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hivewith existing applications.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Configured Sqoop and developed scripts to extract data from MYSQL into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
  • Built and deployed Docker containers to improve developer workflow, increasing scalability and optimization.
  • Used AWS CloudTrail for audit findings and Cloud Watch for monitoring AWS resources
  • Involved in identifying job dependencies to design workflow for Oozie &YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Involved in scheduling Oozie workflow to automatically update the firewall.
  • Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Developed Nifi flows dealing with various kinds of data formats.
  • Implemented test scripts to support test driven development and continuous integration.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Responsible for managing data coming from different sources
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Spark streaming application to pull data from cloud to Hive table.

Environment: Agile, Hadoop 3.0, HBase 2.1, CI, CD, Sqoop, Hive 1.9, AWS, HDFS, Scala, Spark, S3, CSV

Confidential - Menlo Park, CA

AWS Developer

Responsibilities:

  • Worked on multiple AWS accounts with different VPC's for Prod and Non-Prod where key objective included automation, build out, integration and cost control.
  • Amazon EC2 Cloud Instances using Amazon Web Services (Linux/Centos/Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Assisted in migrating the existing data center into the AWS environment.
  • Worked on AWS including EC2, Auto-Scaling in launching EC2 instances, Elastic Load Balancer, Elastic Beanstalk, S3, Glacier, Cloud Front, RDS, VPC, Cloud Watch, Cloud Formation, EMR, IAM and SNS.
  • Created S3buckets and managing policies for S3buckets and Utilized S3bucket and Glacier for Archival storage and backup on AWS.
  • Utilized kubernetes and docker for the runtime environment of the CI/CD system to build, test deploys.
  • Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes.
  • Designed and implemented by configuring Topics in new Kafka cluster.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Used security groups, network ACLs, Internet Gateways, NAT instances and Route tables to ensure a secure zone for organizations in AWS public cloud.
  • Responsible for build and deployment automation using VM WareESX, Docker, Kubernetes containers and Ansible.
  • Used Ansible and Ansible Tower as Configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages change.
  • Used Ansible Playbooks to setup Continuous Delivery Pipeline. Deployed micro services, including provisioning AWS environments using Ansible Playbooks.
  • Created scripts in Python which integrated with AmazonAPI to control instance operations.
  • Worked on Administration and Architecture of Cloud platforms.
  • Maintained the monitoring and alerting of production and corporate servers using Cloud Watch service.
  • Migrated applications from internal data center to AWS.
  • Deployed and configured Git repositories with branching, tagging, and notifications. Experienced and proficient in deploying and administering GitHub.

Environment: AWS, EC2, CI, CD, S3, Docker, Kafka, Jenkins, Python

Confidential - McLean, VA

Python Developer

Responsibilities:

  • Developed frontend and backend modules using Python on Django including Tasty Pie Web Framework using Git.
  • Implemented SQL Alchemy which is a python library for complete access over SQL.
  • Developed views and templates with Python and Django's view controller and templating language to created user-friendly website interface.
  • Used Pandas library for statistical Analysis. Panda's library was used for flexible reshaping and pivoting of data sets.
  • Developed cross-browser/platform with ReactJs, nodeJs, JQuery, AJAX and HTML/CSS to desired design specs for single page layout using code standards.
  • Created UI from scratch using ReactJs.
  • Developed the presentation layer using HTML, CSS, JavaScript, JQuery and AJAX. Utilized Python libraries.
  • Used Django configuration to manage URLs and application parameters.
  • Installed, configured, and managed the AWS server. AWSdata pipeline for Data Extraction, Transformation and Loading from the homogeneous or heterogeneous data sources.
  • Accessed database objects using Django Database APIs. Worked on python-based test frameworks and test-driven development with automation tools.
  • Wrote and executed various SQL queries from python using Python-MYSQL connector and MYSQL DB package.
  • Performed front-end development for web initiatives to ensure usability, using HTML, CSS, Bootstrap, and JavaScript.
  • Developed the required XML Schema documents and implemented the framework for parsing XML documents.
  • Queried MySQL database queries from Python using-MySQL connector and MySQL DB package to retrieve information.
  • Designed and developed a system for data management using MySQL and queries were optimized to improve the performance.

Environment: Python, Django, SQL, ReactJs, nodeJs, JQuery, AJAX, HTML5, CSS3, JavaScript, JQuery, AJAX, AWS

Confidential - Houston, TX

Cloud Developer

Responsibilities:

  • Involved in the analysis, definition, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Worked as a senior design engineer, mainly on C++, STL, data structures, UNIX, multi threading.
  • Implemented AWS Sysopspolicies like Migrating On premise Database to Cloud.
  • Experienced in Performance Tuning and Query Optimization in AWS Redshift.
  • Provisioned and Managed Servers using AWS Lambda which is Server less Computing.
  • Designed, coded, and implemented automated build scripting in Ant, Jenkins/Hudson, and Maven.
  • Experience with Bitbucket, GITin creating the Code repository, and pushing the Changes to the Code Repository and also resolving any Merge Conflicts.
  • Did programming in C on UNIXplatform to contribute to the software project, which automated customized design process.
  • Experienced in writing the Python Code for LambdaFunctions perform necessary logic and derive the values. Also, handy experience in writing the Python Unit test cases for checking the written logic.
  • Experienced in developing middleware components for software in C/C++ using STL, multithreading, data structures, IPC (TCP/IP socket programming), SNMP and design patterns.
  • Worked on Antiphony and tried to access AWS console to leverage the resources to the desired level.
  • Upgraded Jenkins to an existing application. Configuring LDAPAuthentication with the existing software structure.
  • Experienced in creating Splunk Dashboard for Lambda alerts and also using Filter functions in Lambda.
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch.
  • Responsible for end-to-end public Cloud Automation of application delivery, including Infrastructure provisioning and integration with Continuous Integration/Continuous Development (CI/CD) platforms.

Environment: AWS, Ant, Jenkins, UNIX, Python, C, CI, CD, C++

We'd love your feedback!