Big Data Developer Resume
SUMMARY
- 6 years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE, Python, Scala Technologies and Hadoop technologies.
- Experienced in installing, configuring, testing Hadoop ecosystem components on Linux /UNIX including Hadoop Administration (like Hive, pig, Sqoop etc.)
- Expertise in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Zookeeper and NoSQL Database.
- Excellent experience in maintaining and optimizing Azure infrastructure (EMR EC2, S3, EBS, LAMBDA, SQS, SNS, CLOUDWATCH)
- Expertise in developing Spark code using Scala and Spark - SQL/Streaming for faster testing and processing of data.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
- Experienced working with Hadoop Big Data technologies (hdfs and Map reduce programs), Hadoop ecosystems (Hbase, Hive, pig) and NoSQL database MongoDB
- Experienced on usage of NoSQL database column-oriented Cassandra.
- Extensive experience in working with semi/unstructured data by implementing complex map reduce, spark-based programs using design patterns.
- Hands-on programming experience in various technologies like JAVA, J2EE, PYTHON, SCALA
- Expertise in loading the data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experience in optimizing Hive Queries.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database.
- Expertise in job workflow scheduling and monitoring tools like Oozie.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Extensively designed and executed SQL queries to ensure data integrity and consistency at the backend.
- Strong experience in architecting batch style large scale distributed computing applications using tools like Flume, Map reduce, hive etc.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV.
- Experience in working with different scripting technologies like Python, UNIX shell scripts.
- Strong experience in working with UNIX/LINUX environments, writing shell scripts.
- Excellent knowledge and working experience in Agile & Waterfall methodologies.
- Designing and developing data warehouse and Redshift BI Based solutions.
- Involved in Designing Amazon Redshift DB clusters, schema and tables.
- Involved in writing complex SQL using windowing functions to extract data from redshift without stored Procedure.
- Applied Machine Learning and performed statistical analysis on the data.
- Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
- Expertise in Web pages development using JSP, HTML, Java Script, jQuery and Ajax.
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, and MySQL & Sybase databases.
- Worked extensively with EC2, the configuration of Elastic Load Balancers and deploying/creating Cloud Formation Templates for web-application deployment.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
TECHNICAL SKILLS
- Hadoop/Big Data, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Scala, spark, storm, Kafka, Rabbit MQ, Active MQ, Zoo Keeper, Linux, UNIX, Windows
- HBase, Cassandra, CouchDB, MongoDB.
- Cloudera, Horton Works, Map R.
- Teradata, MS SQL Server, Oracle, Informix, Sybase, Informatica, Data stage.
- JAVA, J2EE, Web Services (JAX-RPC, JAXP, JAXM), JMS, JNDI, Servlets, JSP, Jakarta Struts, Python.
- BEA Web Logic, JBoss, Tomcat.
- UML, OOAD.
PROFESSIONAL EXPERIENCE
Big Data Developer
Confidential
Responsibilities:
- Built CICD Pipeline and deployed the Code automatically into production through Jenkins.
- PagerDuty Integration with a scala based framework to trigger alerts.
- Onboarded our application to a streaming data platform to listen to kafka topics..
- Written kafka consumer code in scala to listen and fetch the streaming records and stored the data in AzureS3 for downstream consumption.
- Developed a Python Framework for batch processing of vendor feeds and performed data quality checks, control file validation and tokenize sensitive data and stored them in S3 for downstream analysis.
- Developed and converted D256 & 310 layout files to json formats using spark-scala applying transformations and business logic.
- Used GPG Decryption and encryption mechanism and Azure s3client, putobject, getobject in scala to download files.
- Developed slack alert notification mechanism in case of job failure in python.
- Worked on external facing web-application called partnership portal using Java, jsp and servlets
- Scheduling batch feeds on apache airflow to trigger the data pipeline using feed specific DAGS.
- Worked on enhancing Partnership portal which is a content delivery platform running on Azure and web application is SSO integrated and deployed on tomcat.
- Performed data Cleansing activities and transformed cobol copybook’s data to convert the source file From EBCDIC to ASCII Conversion.
- Tokenized Sensitive Data like Credit Card Number or Plastic Number as part of the batch process file ingestion mechanism using spark-scala based framework.
- Enhanced spark-based internal Turing Application wrapper Script to handle additional configurable variables.
- Converted Shell scripts to python scripts for source code analysis such as security vulnerabilities.
- Developed SAP Business Object reports into excel based reports as part of Business Objects decommissioning.
- Migrating single sign on page from on-prem to on-cloud instance for web-application.
- Adding s3 functionality to a partnership portal where partner user’s can view/download reports.
- Resolved Application security findings such as cross site scripting(XSS),SQL Injection, Sensitive Data exposure in web application.
- Worked end-end on different batch feeds like settlement, rewards files transforming data from ascii to parquet format to land the data in onelake(S3) and snowflake.
- Securely storing, control access tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a hashicorp vault, Envconsul by making API and CURL Calls.
- Attaching Schema to the ascii formatted file in Direct Ingestion framework by making an API call to metadata management tool and converting the file to parquet file.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts, Worked as a Hadoop consultant on (MapReduce/Pig/HIVE/Sqoop)
- Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
- Developed a direct Ingestion framework for Azure cross account data movement in s3 using IAM role and assumed sts role.
Environment: Spark, Scala, Python, Jenkins Hadoop, Maven, Gradle, Shell Scripting, Cloudera, Azure (S3, EMR,EC2,Lambda,SNS,SQS), SQL, Kubernetes, RDBMS, Java, HTML, JavaScript, Web Services, Redshift, Confluent Kafka, SonarQube, Checkmarx, Hashicorp vault, Tomcat.
Sr. Java-BigData Developer
Confidential - Milwaukee,WI
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modelling, development, Implementation, testing
- Created Nifi Processor to move Cassandra data into Mysql Core Ontology.
- Developed a POC Environment Using NIFI to Orchestrate the workflow from Kafka to Mysql.
- Reverse Engineering Informatica into Java using Custom Nifi Processors.
- Using Kafka for Durable Event and Batch Communication.
- Developing Orchestration work to hit multiple endpoints for rest services using Java.
- Consuming REST API’S to see Request And Response.
- Replicating Vendor API Model to Own NM Legacy Model.
- Creating data model Objects and converting the request and response from the Vendor API Model and Hitting it through the Verticle to consume the Rest Service.
- Running the REST API Model through CICD Pipeline Using Jenkins in Kubernetes Cluster.
- Converting XML Mappings in informatica to Java.
- Understand the requirement and unit test framework development.
- Integrate tests with SonarQube and monitor the code coverage.
- Validated RestFul API Services.
- Designed and Documented REST API’S Including JSON Data Formats and API Versioning Strategy.
Environment: Hadoop, HDFS, Pig, Sqoop, Mysql, Maven, Gradle, Shell Scripting, CDH, Cassandra, Cloudera, Azure(S3, EMR), SQL, Kubernetes, Spark, RDBMS, Java, HTML, Java, Nifi, JavaScript, Web Services, CICD, Redshift, Kafka, NiFi, SonarQube, Microservices.
Big Data/Hadoop Developer
Confidential
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modelling, development, Implementation, testing.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
- Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
- Designed and developed automation test scripts using Python
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Analysed the SQL scripts and designed the solution to implement using Pyspark
- Implemented Hive Generic UDF's to incorporate business logic into Hive Queries.
- Responsible for developing a data pipeline with Amazon Azure to extract the data from weblogs and store it in HDFS.
- Analysed the web log data using HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features. ntegrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Creating Hive tables and working on them using Hive QL.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
- Worked on Cluster co-ordination services through Zookeeper.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Maven, Python, Shell Scripting, CDH, MongoDB, HBase, Cloudera, Azure(S3, EMR), SQL, Python, Scala, Spark, RDBMS, Java, HTML, Pyspark, Nifi, JavaScript, WebServices, flink, Redshift, Kafka, Strom, Talend, Microservices.
Java/j2ee developer
Confidential
Responsibilities:
- Used JSF framework to implement MVC design pattern.
- Developed and coordinated complex high-quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
- Wrote JSF managed beans, converters, and validators following framework standards and used explicit and implicit navigations for page navigations.
- Designed and developed Persistence layer components using Hibernate ORM tool.
- UI designed using JSF tags, Apache Tomahawk & Rich faces.
- Oracle 10g used as a backend to store and fetch data.
- Experienced in using IDE like Eclipse and Net Beans, integration with Maven
- Creating Real-time Reporting systems and dashboards using xml, MySQL, and Perl
- Working on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology) Involved in detailed analysis based on the requirement documents.
- Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL, and Ant.
- Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
- Used Node JS for server-side rendering. Implemented modules into Node JS to integrate with designs and requirements.
- JAX-WS used to interact in the front-end module with the backend module as they are running in two different servers.
- Responsible for Onshore deliverables and provide design/technical help to the team and review to meet the quality and timelines.
- Migrated existing Struts application to Spring MVC framework.
- Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
- Extensively used LDAP Microsoft Active Directory for user authentication while login.
- Developed unit test cases using JUnit.
- Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
- Tomcat is the web server used to deploy OMS web applications.
- Used SOAP Lite module to communicate with different web-services based on given WSDL.
- Prepared technical reports documentation manuals during the program development.
Environment: JDK 1.8, JSF, Hibernate 3.0, JIRA, NodeJS, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/UNIX.