Big Data Developer Resume
Madison, WI
SUMMARY
- 7 years of experience in full life cycle development involving analysis, design, development, testing, documentation, implementation & maintenance of application software in Web - based environment, Distributed n-tier architecture. Also, experience in mentoring and coaching.
- Extensively worked on coding using core Java/J2EE concepts like multithreading, collections, data structures, algorithms, generics, network APIs and database connections and proficient in all layers of a multi-tier application.
- Experienced in using Java IDE tools like Eclipse, IntelliJ, RSA, Net Beans.
- Expertise in developing user interface applications with HTML, HTML5, CSS, JavaScript, jQuery, XML, AJAX.
- Good knowledge of developing RESTful Web Services using Spring MVC and Tomcat/Glass Fish Servers.
- Implemented the application using Spring IOC, Spring MVC Framework, Spring Batch, and Spring Boot also handled the security using Spring Security.
- Experience in using and tuning relational databases like Confidential SQL Server, Confidential, MySQL and columnar databases like Confidential Redshift, Confidential SQL Data Warehouse.
- Experience in working with ELK, for storing the logs and created production level AWS infrastructure using Terraform.
- Experience in designing and deploying several applications utilizing almost all the AWS stack (Including EC2, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing upon high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
- Expertise in implementing Ad-hoc queries using Hive QL and importing and exporting data using Sqoop from HDFS Hive & HBase to Relational Database Systems like Confidential &Teradata and vice-versa.
- Proficiency in Spark for loading data from the Relational and NoSQL databases using Spark SOL and building big data applications using Apache Hadoop.
- Deep understanding of Apache Spark and its components Spark, Spark Streaming for better analysis and processing of data.
- Extensive experienced in working with structured data using Hive QL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
- Efficient in analyzing data using Hive QL, Pig Latin, partitioning on existing data set with static and dynamic partition, tune data for optimal query performance.
- Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit Integration Testing, UNIX, and shell scripting.
- Extensive experience in ticketing and tracking tools like JIRA for Production hotfixes and bugfixes. Confluence for documenting the Application process flow.
- Hands on Experience in Docker based container deployments to create self-environments for dev teams and managed the clusters using Kubernetes. Evaluated Kubernetes for Docker Container Orchestration.
- Experience in writing deployment scripts using ANT and Maven. Deployed applications in Tomcat, WebSphere and JBoss.
- Experience on working with on-premises network, application, server monitoring tools like Nagios, Splunk, App Dynamics and on AWS with Cloud Watch monitoring tool.
- Experience in working with ServiceNow for Incident reporting, Incident creating and updating incidents that occur around Governance Division.
- Expertise to follow Agile process in application development with good knowledge on Agile methodology and Scrum process.
PROFESSIONAL EXPERIENCE
Confidential, Madison, WI
Big Data Developer
Responsibilities:
- Strong understanding of concurrency patterns, data structures and algorithms, and strong understanding of domain driven design, Microservices patterns and architectures in Java applications.
- Implementing REST messages for communication between web service client and service provider.
- Developing Restful Web services for transmission of data in Confidential format.
- Exporting data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
- Designing and implementing Java engine and API to perform direct calls from front-end JavaScript to server-side Java methods.
- Working on Data Lake architecture to build a reliable, scalable, analytics platform to meet batch, interactive and on-line analytics requirements.
- Involving in the process of data acquisition, data pre-processing and data exploration of project in Spark using Scala.
- Developing Spark code using Java, Data Frames and SparkSQL for faster processing of data also used Spark Data frames, SparkSQL extensively.
- Performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala. Developed Scala and SparkSQL code to extract data from various databases.
- Using Spark SQL to process the huge amount of structured data and implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Using numerous user defined functions in hive to attain complex business logic in feed generation.
- Creating reusable Python scripts and added it to distributed cache in Hive to generate fixed width data files using an offset file.
- Created tables, loading with data, and writing Hive queries which will run internally in map. Importing and exporting data into HDFS and Hive using Sqoop.
- Created partitioned tables in Hive used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing.
- Build servers using AWS, importing volumes, launching EC2, RDS, creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection.
- Implementing to reprocess the failure messages in Kafka using offset id and writing Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Installing KSQL cluster on confluent cloud platform and using KSQL to transform data within the pipelines, messages to cleanly land in database.
- Using KSQL to make it simple to transform data within the pipeline, readying messages to cleanly land in another system.
- Implementation of a log producer in Scala that watches for application logs, transform incremental log, and sends them to a Kafka and Zookeeper based log collection platform.
- Integrating Kubernetes with network, storage of security to provide a comprehensive infrastructure and orchestrating the Kubernetes containers across the multiple hosts.
- Implementing Jenkins and built pipelines to drive all microservice builds out to Docker registry and deploying to Kubernetes.
- Conducting meetings with business and development teams for data validation and end-to-end data mapping.
- Strong experience working in an integrated Agile for all phases of Software Development Lifecycle (SDLC) including statistical data analysis and hypotheses testing.
- Working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
Environment: Java, Confidential, Hive, SQL, Spark, Kafka, AWS, REST API, Docker, Kubernetes, Jenkins.
Confidential, Columbus, OH
Data Engineer
Responsibilities:
- Involved in designing and development of web interface using JSP, Servlets, JavaScript and JDBC for administering and managing users and clients.
- Developed REST services using Confidential for storing and exchanging information between browsers and servers.
- Implemented Java Messaging Services JMS and backend messaging MQ for asynchronous exchange of payment processing data.
- Developed Java Server Pages JSP, PL/SQL Procedures Functions to perform business transaction. Involved in the implementation of business logic in Struts Framework and Hibernate
- Involved in the creation of database objects like Tables, Views, Stored Procedures, Functions, Packages, DB triggers, Indexes using Confidential tools like TOAD, PL/SQL, and SQL plus
- Converted existing Terraform modules that had version conflicts to utilize cloud formation during Terraform deployments to enable more control or missing capabilities.
- Intensively worked on Data ingestion and integration from Various sources like SAP, Confidential, SQL server and EDW into Hadoop.
- Involved in development of Java scripts and Spark SQL jobs to handle huge volume of ETL workload.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and developed Hive Scripts (HQL) for automating the joins for different sources.
- Worked on creating custom NiFi flows for batch processing. The data pipeline includes Apache Spark, NiFi, and Hive.
- Involved in file movements between HDFS and AWS S3 using NIFI.
- Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with PowerShell to automate routine jobs.
- Wrote Lambda functions in python for AWS Lambda and invoked PowerShell scripts for data transformations and analytics on large data sets in EMR clusters and AWS Kinesis data streams.
- Load and transform large sets of structured, semi structured, and unstructured data, Used AVRO, Parquet file formats for serialization of data.
- To create SQL Workbench by using Jdbc driver and enter commands to transfer data into redshift which nothing but Database.
- Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Provisioned the highly available EC2 Instances using Terraform and cloud formation and wrote new python scripts to support new functionality in Terraform.
- Worked as an RLC (Regulatory and Legal Compliance) team member and undertook user stories (tasks) with critical deadlines in Agile environment.
- Conduct daily stand up with offshore team, updating them with applicable tasks & getting updates for the onshore team on a day-to-day basis.
- Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level design documents from the requirements specification.
Environment: Spark, Java, REST, HDFS, Hive, Kafka, AWS, Avro, Parquet, NiFi, Linux.
Confidential, Chicago, IL
Data Engineer
Responsibilities:
- Involved in Requirements gathering, Analysis and Design of the application and created use case diagrams, class diagrams and sequence diagrams using Rational Rose.
- Designed and developed web interfaces using MVC Architecture and Spring Framework.
- Implemented Spring Framework for dependency injection to inject appropriate class objects depending on the source of the application process.
- Developed application using Spring MVC and AJAX on the presentation layer, the business layer is built using spring and the persistent layer uses Hibernate.
- Developed views and controllers for client and manager modules using Spring MVC and Spring Core and Used Spring Security for securing the web tier Access.
- Designed and developed Spark jobs to enrich the click stream data and implemented Spark jobs using Java, used SparkSQL to access hive tables into spark for faster processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, java.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote processed streams to HBase and streamed data using Spark with Kafka.
- Implemented Sqoop job to perform import / incremental import of data from any relational tables into Hadoop in different formats such as text, Avro, Parquet, and sequence into Hive tables.
- Developed shell scripts to periodically perform incremental import of data from third party API to Confidential AWS.
- Created HBase tables to load large sets of structured, semi structured, and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
- Experienced in troubleshooting various Spark applications using spark-shell, spark-submit.
- Implemented data integrity and data quality checks in Hadoop using Hive and Linux Scripts.
- Used Impala to analyze to data ingested into HBase and compute various metrics for reporting on the dashboard.
- Experienced in using Tableau Desktop to represent the data from various sources to access the data easily for business and end users.
- Responsible in analysis, design, testing phases and responsible for documenting technical specifications.
- Coordinated effectively with offshore team and managed project deliverable on time.
Responsibilities: Spark, Java, Hibernate, HDFS, Hive, Sqoop, Oozie,, Kafka, AWS, Tableau, Avro, Parquet,, Linux.
Confidential, Stamford, CT
Hadoop Developer
Responsibilities:
- Developed a Financial Model Engine for the sales Department on Big Data infrastructure using Scala and Spark.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Prepared technical documentation of the POC with all the details of installation, configuration, issues faced and their resolutions, Pig scripts, Hive queries, and process for executing them etc.
- Analyzed the data using HiveQL to identify the different correlations and used core Java technologies to create Hive/Pig UDFs to use in the project.
- Worked with open-source communities to commit code, review code, drive enhancements and with data center teams on testing and deployment.
- Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework.
- Worked closely with the data modelers to model the new incoming data sets and Developed and maintained HiveQL, Pig Latin Scripts, Scala and Map Reduce and Wrote Map Reduce job using Scala.
- Developed Spark SQL script for handling different data sets and verified its performance over MR jobs.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Imported data from local file system, RDBMS into HDFS and Sqoop and developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Evaluated various data processing techniques available in Hadoop from various perspectives to detect aberrations in data, provide output to the BI tools, etc.
- Cleaned up the input data, specified the schema, processed the records, wrote UDFs, and generated the output data using Pig and Hive.
- Creating MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Compared the execution times for the functionality that needed joins between multiple data sets using MapReduce, Pig and Hive.
- Used Apache Spark to execute Scala Source Code for Confidential Data Processing and developed code to process it.
- Used real time batch processing to detect and discover customer buying patterns from historical data and then monitoring customer activity to optimize the customer experience. This leads to more sales and happier customers.
- Compared the performance of the Hadoop based system to the existing processes used for preparing the data for analysis.
Environment: CDH5, Hadoop, HDFS, MapReduce, Hive, Oozie, Sqoop, Linux, Java, Spark, Scala, SBT, Eclipse, JD Edwards Enterprise One.
Confidential, Lombard, IL
Hadoop Developer
Responsibilities:
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Importing and exporting data into HDFS and HIVE using Sqoop and Responsible for loading data from UNIX file systems to HDFS.
- Configured different topologies for spark cluster and deployed them on regular basis.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Writing UDF (User Defined Functions) in Pig, Hive when needed and Developing the Pig scripts for processing data.
- Managing workflow and scheduling for complex map reduce jobs using Apache Oozie.
- Responsible for Spark Streaming configuration based on type of Input Source.
- Performed optimization of MapReduce for effective usage of HDFS by compression techniques.
- Performed validation and standardization of raw data from XML and Confidential files with Pig and MapReduce.
- Implemented complex MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
- Involved in performance tuning of Hive from design, storage, and query perspectives.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Collected the Confidential data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Involved in performing the Linear Regression using Scala API and Spark.
- Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Hive, Java, Spark, Oozie, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Eclipse, Scala, HBase.
Confidential, Chicago, IL
Software Engineer
Responsibilities:
- Developed the user interface screens using swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory, and transportation.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Experienced with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Experienced in defining, designing, and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experienced in Develop monitoring and performance metrics for Hadoop clusters.
- Experienced in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and java related bugs.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Involved in Creating, Upgrading, and Decommissioning of Cassandra clusters.
- Involved in working on Cassandra database to analyze how the data get stored.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Experienced in Automate deployment, management, and self-serve troubleshooting applications.
- Defined and evolved existing architecture to scale with growth data volume, users, and usage.
- Designed and developed JAVA API (Commerce API) which provided functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and written Hive UDFs.
- Experienced in managing the CVS and migrating into Subversion.
- Experienced in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6)
Confidential
Java Developer
Responsibilities:
- Involved in designing database connections using JDBC.
- Involved in design and development of UI using HTML, JavaScript, and CSS.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using sqlsever2000, database modification using SQL, Pl/SQL triggers, views in Confidential .
- Build the applications using Ant tool, also used eclipse as the IDE.
- Involved in the logical and physical database design and implemented it by creating suitable tables, views, and triggers.
- Applied J2EE design patterns like business delegate, DAO, and singleton.
- Created the related procedures and functions used by JDBC calls in the above requirements.
- Actively involved in testing, debugging and deployment of the application on WebLogic application server.
- Developed test cases and performed unit testing using JUnit.
- Involved in fixing bugs and minor enhancements for the front-end modules.
Environment: Java, HTML, Java script, CSS, Confidential, JDBC, ANT tool, SQL, Swing and Eclipse