Big Data Developer Resume
Lafayette, LA
SUMMARY
- Over 9 years of IT experience in application design, development, testing and technical support and guidance in Python, Java and Cloud Tools.
- Experience working in all stages of Software Development Life Cycle (SDLC) including Requirements, Analysis and Design, implementation, integration and testing, deployment and maintenance.
- Hands on experience in usingHadoopTechnologies such as Map Reduce, HDFS, Hive, Spark, Oozie, Pig, Kafka, Flume, NiFi, Impala, Storm, Zookeeper.
- Hands on experience in developing Spark Scala Streaming app to consume the data from Kafka and writing the data to another Kafka topics as well as persisting the data to HBase.
- Hands on experience in developing Scala processor to consume and publish the data to Kafka topics.
- Experienced with GIT as version control system and used Ansible for application deployment and task automation.
- Experience in using Rundeck for deploying the applications and checking the logs using Kibana.
- Hands - on experience in Import / Export the data using Sqoop from HDFS to RDBMS and vice-versa.
- Hands on Experience with Cassandra, HBase and MongoDB NoSQL Databases.
- Good Understanding with distributed technologies such as Spark and Hadoop.
- Experience in working with Elastic MapReduce (EMR) and setting up environments on Amazon AWS EC2 instances.
- Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses.
- Hands-on experience in AWS Cloud platform and its features which includes services like: EC2, S3, EBS, VPC, ELB, IAM, LAMBDA, Auto scaling
- Good Knowledge on Route 53, Cloud Front, Cloud Watch, Cloud Trail, Cloud Formation and Docker
- Good Knowledge of Snowflake, Airflow Scheduler and Netflix Genie.
- Experience in using AWS SDK Java, Python.
- Experience in uploading the data, Host Static Websites, Encrypt Data, Implement Bucket Policy and Setup CORS in S3 using, Web Console, AWS CLI and AWS SDK for Python (Boto3).
- Experienced with Jenkins as Continuous Integration / Continuous Deployment Tool and strong experience with Ant and Maven Build Frameworks.
- Experience in technologies such as Core Java, HTML5, CSS3, AJAX, XHTML, JavaScript, CSS, jQuery and Bootstrap.
- Strong abilities in Design Patterns, Database Design, Normalization, writing Stored Procedures, Triggers, Views, Functions in MS SQL Server, Oracle and PostgreSQL.
- Adept at preparing business requirements documents, defining project plans, writing system requirements specifications
- Worked on Python classes from the respective APIs so that they can be incorporated in the overall application.
- Experience with setting up and developing flows in Apache NiFi using processer and groups
- Worked with different distributions ofHadooplike Hortonworks andCloudera
- Worked on using different file formats like Sequence, AVRO, ORC files, Parquet files and CSV using different compression Techniques.
- Experience in Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Expertise in writing Custom UDF's to in corporate complex business logic into Hive Queries.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Map Reduce, Spark, Apache Flume, Kafka, Pig, Hive, Oozie, Apache SOLR, YARN, Zookeeper, Sqoop, Impala
Databases: Oracle 10/11g, MySQL, PostgreSQL, Cassandra, HBase, Mongo DB, Spark SQL
Programming Languages: Python, Scala, Java, Java Script, Spring Boot
Cloud Tools: Amazon EC2, Amazon S3, Dynamo DB, Redshift, IAM, Cloud Front, Elastic Map Reduce, Lambda, Kinesis, Route 53, Cloud Trail, AWS Data Pipeline, AWS Database Migration Service
ETL Tools: Informatica
Data Analytics: Python, Numpy and Scipy
Build Tools: Apache Maven, SBT
Platforms: Linux, Mac, Windows
Version Control: GIT and SVN
PROFESSIONAL EXPERIENCE
Confidential, Lafayette LA
Big Data Developer
Responsibilities:
- Understanding the project documentation, analyzing and converting into technical requirements.
- Requirement analysis and Estimation of project timelines.
- Participating in Sprint Planning and Releases.
- Performed efficient delivery of code based on principles of Test Driven development (TDD) and continuous integration to keep in line with Agile Software Methodology principles and SCRUM process.
- Developed a bare Spark Streaming app using Scala which consumes the messages from a Kafka topic and publishes it to another Kafka Topic.
- Developed Scala Processors which consumes the messages from a Kafka topic and the message that is consumed from the topic is processed and sent to another Kafka topic as well as persisted into the HBase in Avro format.
- Prepared the Case Classes based on the Data Models that are to be written to Kafka and HBase.
- Assisted in writing a HTTP Client which hits External Service to fetch more details about the message and publish to the Kafka Topic.
- Assisted in writing a Terminology API which hits the MongoDB to fetch more details about the message and publish to another Kafka Topic.
- Transformed the detailed messages that are fetched from External Service/API as per the defined data models.
- Created Hive tables and as well as views on top of the tables based on the data models that are stored in HBase.
- Added metrics and logging to check the application behavior in Grafana Dashboards.
- Updated the Ansible configurations and used Rundeck to deploy the applications on to diff environments.
- Staged the application in prod by writing the transformed models to a dead topic for checking the behavior of the application.
- Troubleshooted the application behavior and changed the Kafka configs to make the application more stable.
- Loaded the existing tables into Spark Data Frames/Data Sets and created new Hive tables after performing the necessary transformations based on the Business requirements.
- Used Kibana to check the logs of the application.
Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Kafka, Hive, Avro, Ibis, Impala, HBase, MongoDB, Ansible, Rundeck, UDeploy, Kibana, Git, Linux, Robo Mongo.
Confidential, Charlotte NC
Big Data Cloud Analytics Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop and Spark.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Python scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications (Spark Streaming, Spark SQL) for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Experience in writing the UDFs to parse the data as per the requirements on Spark-Streaming using Spark-Context
- Loaded the data into Spark RDD/ Data Frames and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Python.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Good Experience working with Amazon AWS for setting up Hadoop cluster.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good Knowledge with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Python, Kafka, Hive, Avro, Sqoop, Amazon AWS, Impala, Cassandra, Linux.
Confidential, AL/DE
Hadoop / AWS Developer
Responsibilities:
- Involved in gathering requirements from different teams to design the ETL migration process from RDBMS to Hadoop Cluster.
- Used AWS EMR to process the Data.
- Extensively used Sqoop to import data from RDBMS and export it back
- Used Compression Techniques (snappy, Gzip) with file formats like Parquet, Avro and Sequence Files to leverage the storage in HDFS
- Developed Custom UDF’s for cleansing and transforming data
- Implemented Hive's Dynamic partitions and Hive Buckets depending on the downstream business requirements and Analyzed the existing Hive queries and implemented advanced queries using functions like RANK to optimize the performance of the Hive queries
- Used Apache NiFi flow to perform the conversion of Raw XML data into ORC files, Parquet files
- Performed Data Integrity checks
- Worked with Pig as a ETL tool to do Transformations, joins and some pre-aggregations before storing the data onto HDFS
- Used Oozie workflow engine to manage interdependent Hadoop jobs.
- Implemented Oozie Coordinator to schedule the workflow, leveraging both data and time dependent properties
- Used Sub Version (SVN) for version control and code management
- Wrote Shell Scripts to automate end-to-end jobs and dealt with tools like AutoSys to automate jobs
Environment: Cloudera Hadoop, HDFS, Hive, Pig, Map Reduce, Sqoop, Zookeeper AWS EMR, AutoSys and Oozie, Avro, SVN, Shell Scripting, Linux
Confidential, DE
Hadoop / AWS Developer
Responsibilities:
- Involved in developing Web services in Service Oriented Architecture (SOA).
- Configured AWS Security Groups which acts as a virtual firewall that controls the traffic for one or more AWS EC2 instances.
- Configured Launch configurations, Auto-scaling groups, target Groups and Classic/Application load balancers.
- Configured AWS Identity and Access Management (IAM) to securely manage AWS users & groups, and use policies & roles to allow or deny access to AWS resources.
- Experience in AWS Cloud Front, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Designed and created Cloud Formation templates to create stacks.
- Configured custom metrics for the AWS Cloud Watch for detailed monitoring.
- Implemented continuous integration using Jenkins. Configured security to Jenkins and added multiple nodes for continuous deployments.
- Used shell scripting for loading data from edge node to HDFS.
- Configured MySQL Database to store Hive metadata.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group, and aggregation and how does it translate to MapReduce jobs
- Worked in tuning Hive and Pig scripts to improve performance
- Good experience in writing MapReduce programs in Java on MRv2 / YARN environment
- Good experience in troubleshooting performance issues and tuning Hadoop cluster
- Imported and exported data into HDFS and Hive using Sqoop.
- Good Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce and Shell scripts (for scheduling of few jobs)
- Used Tableau9.0 to create reports representing analysis in graphical format.
- Experience in managing and reviewing Hadoop log files.
- Created MapReduce jobs using Pig Latin and Hive Queries.
- Used Git as version control for scripts and configurations.
- Using Amazon Route53 to manage public and private hosted zones.
- Created SNS topics and managed subscriptions.
Environment: Hortonworks Data Platform 2.2, AWS EC2, S3, IAM, VPC, Python 2.7 Boto3, Jenkins, AWS Cloud watch, Git, Route53, Linux, Hadoop, Sqoop, Flume, Oozie, MapReduce, HDFS, Pig, Hive, HBase, MySQL, Ubuntu
Confidential, DE
Python Developer
Responsibilities:
- Involved in the analysis, design and development of the project life cycle.
- Designed and created web pages using HTML and CSS, Bootstrap, JavaScript, jQuery, Ajax and JSON.
- Developed Restful API’s using Flask.
- Developed web pages to connect and interact with data base using Django and SQLAlchemy.
- Involved in database design based on requirements.
- Involved in writing and modifying stored procedures, views, and tables in SQL database.
- Python was used to write Views, models, templates, and database queries and Django's (a web framework) MVC pattern takes care of the interaction between Model, View and leaving us with templet/HTML file.
- Continuously maintained and troubleshoot thePythonDjango modules.
- Used Visio for Data Modeling and Database Design.
- Resolved ongoing problems and accurately documented progress of the complete project.
- Support for existing applications.
- Developed release notes for deployment. The support team uses the document for deployment.
Environment: Python 2.7, Flask, Django, HTML, CSS, JavaScript, Linux, JQuery, SQL Server, GIT, Shell Scripting
Confidential, DE
Python Developer
Responsibilities:
- Participated in requirement gathering and worked closely with the architect in designing and modeling.
- Involved in the design, development and testing phases of application using AGILE methodology.
- Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using Flask, SQL Alchemy and PostgreSQL.
- Designed and developed components using Python with Django framework. Implemented code in python to retrieve and manipulate data.
- Develop consumer based features and applications using Python, Django, HTML, behavior Driven Development (BDD) and pair based programming.
- Worked closely with back-end developer to find ways to push the limits of existing Web technology.
- Designed and developed the UI for the website with HTML, XHTML, CSS, Java Script and AJAX
- Involved in writing SQL queries implementing functions, cursors, object types, sequences, indexes, and stored procedures, Functions, Packages and Triggers in SQL Server.
- Designed dynamic client-side JavaScript codes to build web forms and performed simulations for web application page.
Environment: Python 2.7, Django 1.4, web2y, Flask, Struts, JavaScript, AJAX, XML, SQL Server HTML, XHTML, CSS, GIT
Confidential, DE
Python Developer
Responsibilities:
- Responsible for gathering requirements, system analysis, design, development, testing and deployment.
- Participated in the complete SDLC process.
- Developed Business logic using Python 2.7
- Used Django framework for database layer development.
- Developed user Interface GUI using CSS, HTML, JavaScript and JQuery.
- Responsible for setting up Python REST API framework using DJANGO.
- Created database using MySQL, wrote several queries to extract data from database.
- Wrote scripts in Python for automation of testing jobs.
- Deployment and Build of various environments including Linux and UNIX.
- Jira is used as project management tool for issue tracking and bug tracking.
- Effectively communicated with the external vendors to resolve queries.
- Implemented monitoring and established best practices around using elastic search.
Environment: Python 2.7, Django 1.4, C++, Java, Jenkins, JSON, XML, SOAPUI, HTML, Restful API, Shell Scripting, SQL, MySQL, GIT, Linux.
Confidential, DE
Software Programmer
Responsibilities:
- Involved in the analysis, design and development of the project life cycle.
- Designed and developed Web pages (presentation layer) using Java/JSP.
- Coded Client-side validations heavily using Java Script.
- Involved in writing and modifying stored procedures, views, and tables in SQL Server database.
- Designed and developed various levels of security measures such as data access and login privileges according to the levels of user’s login.
- Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in business meetings to gather additional requirements for the application.
- Created numerous reports using SSRS.
- Involved in configuring and deploying the application using WebSphere.
- Involved in code reviews and mentored the team in resolving issues.
- Undertook the Integration and testing of the various parts of the application.
- Used Subversion for version control and log4j for logging errors.
- Code Walkthrough, Test cases and Test Plans
Environment: Java, JSP, HTML, JavaScript, SQL Server, SSRS, Web Services.