Sr. Big Data/hadoop Developer Resume
Irvine, TX
SUMMARY
- Having 9+ years of experience in IT which includes Analysis, Design, Development of Big Data using Hadoop, design and development of web applications using JAVA, J2EE and data base and data warehousing development using My SQL, Oracle and Informatica.
- Around 5+ years of work experience on Big Data Analytics with hands on experience in installing, configuring and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark.
- Good Understanding of Hadoop architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Experience in using Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster (CDH4&CDH5).
- IT industry experience exploring various technologies, tools and databases like Big Data, AWS, S3, Snowflake, Hadoop, Hive, Spark, python, Sqoop, CDL(Cassandra), Teradata, E - R (Confidential), Tableau, SQL, PLSQL, Abinitio(ACE), and Redshift but always making sure of living in the world I cherish most i.e. DATA WORLD.
- Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
- Provide guidance to development team working on PySpark as ETL platform
- Extensive experience in DevOps implementation in Azure by automating infrastructure creation using PowerShell DSC. Designed, Configured and Managed containerized application infrastructure utilizing Microsoft Azure Containers and AKS.
- Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK’s
- Deployed application to GCP using Spinnekar (rpm based)
- SQL Server Developer/Data Analyst/DBA (SQL/TSQL/SSAS/SSIS/SSRS/ Azure) with 7+ Years of experience in MS SQL SERVER 2019/2017/2014/2012, 2008R2/2008 Database Administration in Production, Test and Development environment.
- Experience in Data load management, importing & exporting data using SQOOP & FLUME.
- Experience in analyzing data using Hive, Pig and custom MR programs in Java.
- Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
- Experienced in writing Map Reduce programs & UDF's for both Pig & Hive in java.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Developed Hadoop test classes using MR unit for checking Input and Output.
- Experience in integrating Hive and Hbase for effective operations.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Experience in Impala, Solr, MongoDB, HBase and Spark.
- Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
- Hands on knowledge of writing code in Scala.
- Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
- Used source debuggers and visual development environments.
- Experience in Testing and documenting software for client applications.
- Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
- Good experience in object-oriented design (OOPS) concepts.
- Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
- Good working knowledge on Spring Framework.
- Strong Experience in writing SQL queries.
- Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
- Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and Hbase,Spark
Programming Languages: Java (5, 6, 7),Python, Scala
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell
ETL Tools: Cassandra, HBASE,ELASTIC SEARCH
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office,MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Confidential EC2
PROFESSIONAL EXPERIENCE
Confidential, Irvine, TX
Sr. Big Data/Hadoop Developer
Responsibilities:
- Worked on AI Processor piece of SOI. Well versed with building CI/CD pipelines with Jenkins, Used tech stack like Gitlab, Jenkins, Helm, Kubernetes.
- Worked on Spark Structured Streaming for developing Live Steaming Data Pipeline with Source as Kafka and Output as Insights into Cassandra DB. The Data was fed in JSON/XML format and then Stored in Cassandra DB.
- Involved in Developing Insight Store data model for Cassandra, which was utilized to store the transformed data.
- Worked on Developing Data Pipeline to Ingest Hive tables and File Feeds and generate Insights into Cassandra DB
- Worked on installing, configuring, and monitoring Apache Airflow for running both batch and streaming workflows.
- Deploying VM's, Storage, Network and Resource Group through Azure Portal
- Creating Storage Pool and Stripping of Disk for Azure Virtual Machines. Backup, Configure and Restore Azure Virtual Machine using Azure Backup.
- Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
- Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, PySpark.
- Good working knowledge on Snowflake and Teradata databases.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Taking Backup to cloud Storage Account using Cloudberry Cloud Storage Tools. Configure Site to Site VPN Connectivity.
- Convert existing Virtual Machine from Standard to Premium Storage Account. Patching and Validating of Virtual Machine in Azure.
- Launched multi-node Kubernetes cluster in Google Kubernetes Engine (GKE) and migrated the dockerized application from AWS to GCP.
- Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
- Monitor Azure Infrastructure through System Center Operation Manager (SCOM).
- Worked on Setting Up and Configuring ELK Stack for Error Log capturing and Management
- Worked on Load Testing Multiple Prod Cassandra Cluster with read/write up to 1 million records.
- Involved and well versed with current SOI Architecture for generating NBA. Which include working on Scoring Model NBA’s, Batch Model NBA’s etc.
- Involved with Optimizing Cassandra Namespaces for Low latency and high fault tolerance
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
- Used Spark-Streaming APIs to perform necessary transformations.
- Worked with spark to consume data from Kafka and convert that to common format using Scala.
- Converted existing Map Reduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
- Good Knowledge in Confidential AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
- Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
- Developing UDFs in java for hive, pig, and worked on reading multiple data formats on HDFS using Scala.
- Developed Scripts, automated data management from end to end, and synchronize up between all the clusters.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run Map Reduce jobs in the backend.
- Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3
- Designed and customized data models for Data warehouse supporting data from multiple sources on real time. Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse. Created mapping documents to outline data flow from sources to targets
- Involved in migration from Livelink to SharePoint using Scala through Restful web service.
- Extensively involved in developing Restful API using JSON library of Play framework.
- Used AWS infrastructure to host the portal. Used EC2, RDS, and S3 features of AWS.
- Used Scala collection framework to store and process the complex consumer information.
- Used Scala functional programming concepts to develop business logic.
- Designed and implemented Apache Spark Application (Cloudera)
- Importing and exporting data into HDFS Sqoop and Flume and Kafka.
- Troubleshoot and debug Hadoop ecosystem run-time issues.
- Analyzing effected code line objects and design suitable algorithms to address problem.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract
- Used Zookeeper for providing coordinating services to the cluster.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Spark, GCP, Cloudera CDH4, Oracle, MySQL, Tableau, Talend, Elastic search, Kibana, SFTP.
Confidential, Plano, TX
Sr. Spark Scala Developer
Responsibilities:
- Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data Lake and Huge Data warehouses.
- Processing of incoming files using Spark native API.
- Usage of Spark Streaming and Spark SQL API to process the files.
- Experience in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data Lake
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Involved in Analysis, coding, testing, user acceptance testing, production implementation and system support for the Enterprise Data Warehouse Application.
- Architected and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
- Processing the schema oriented and non-schema-oriented data using Scala and Spark.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Implemented Containerized Python application deployment using Docker and Azure Kubernetes Service.
- Designed roles and groups using Azure Identity and Access Management (IAM).
- Developed spark applications in python(PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
- Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.
- Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
- Migrated SQL Server 2008 database to Windows Azure SQL Database.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Developed Spark scripts to import large files from Confidential S3 buckets.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer API in Scala for consuming data from Kafka topics.
- Developing Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
- Expertise in snowflake to create and Maintain Tables and views.
- Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
- Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
- Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Developed and designed automate process using shell scripting for data movement and purging.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data
- Developed functional programs in Scala for connecting the streaming data application and gathering web data using JSON, XML, and passing it to Flume.
- Extensively used SCALA for connecting and retrieving data from NO-SQL databases such as MongoDB, PIG, HIVE, Cassandra, and HBASE
- Connected to AWS EC2 using SSH and ran spark-submit jobs
- Involved in administration, installing, upgrading and managing CDH3, Pig, Hive HBase.
- Played a key-role is setting up a 50 node Hadoop cluster utilizing Apache Spark by working closely with the Hadoop Administration team.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Environment: Hadoop, Map Reduce, HDFS, Scala, GCP, Spark Cloudera Manager, Pig, Sqoop, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, HBase.
Confidential, Seattle, WA
Bigdata Developer
Responsibilities:
- Written Map-Reduce code to process all the log files with rules defined in HDFS
- (as log files generated by different devices have different xml rules).
- Developed and designed application to process data using Spark.
- Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration project.
- Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
- Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
- Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data for Marketing & Offers platform.
- Developed and designed automate process using shell scripting for data movement and purging.
- Installation & Configuration Management of a small multi node Hadoop cluster.
- Installation and configuration of other open source software like Pig, Hive, Flume, Sqoop.
- Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Importing and exporting data into Impala, HDFS and Hive using Sqoop.
- Responsible to manage data coming from different sources.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Developed Hive tables to transform, analyze the data in HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Writing the script files for processing data and loading to HDFS.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, Sqoop.
Confidential, Irvine, CA
Hadoop Developer
Responsibilities:
- Processed data into HDFS by developing solutions.
- Analyzed the data using Map Reduce, Pig, Spark API, Hive and produce summary results from Hadoop to downstream systems.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
- Created HBase tables to load large sets of structured data.
- Managed and reviewed Hadoop log files.
- Involved in providing inputs for estimate preparation for the new proposal.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
- Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
- Used different file formats like Text files, Sequence Files, Avro.
- Cluster co-ordination services through Zookeeper.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
- Trouble shooting.
- Installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.
Confidential
Java Developer
Responsibilities:
- Involved in analysis and design phase of Software Development Life cycle (SDLC).
- Used JMS to pass messages as payload to track statuses, milestones and states in the workflows.
- Involved in reading & generating pdf documents using ITEXT and also merge the pdfs dynamically.
- Involved in the software development life cycle coding, testing, and implementation.
- Worked in the health-care domain.
- Involved in Using Java Message Service (JMS) for loosely coupled, reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
- Developed MDBs using JMS to exchange messages between different applications using MQ Series.
- Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
- Involved in Content Management using XML.
- Developed a standalone module transforming XML 837 module to database using SAX parser.
- Installed, Configured and administered WebSphere ESB v6.x
- Worked on Performance tuning of WebSphere ESB in different environments on different platforms.
- Configured and Implemented web services specifications in collaboration with offshore team.
- Involved in Creating dash board charts (business charts) using fusion charts.
- Involved in creating reports for the most of the business criteria.
- Involved in the configurations set for Web logic servers, DSs, JMS queues and the deployment.
- Involved in creating queues, MDB, Worker to accommodate the messaging to track the workflows
- Created Hibernate mapping files, sessions, transactions, Query and Criteria’s to fetch the data from DB.
- Enhanced the design of an application by utilizing SOA.
- Generating Unit Test cases with the help of internal tools.
- Used JNDI for connection pooling.
- Developed ANT scripts to build and deploy projects onto the application server.
- Involved in implementation of continuous build tool as Cruise control using Ant
- Used Star Team as version controller.
Environment: JAVA/J2EE, HTML, JS, AJAX, Servlets, JSP, XML, XSLT, XPATH, XQuery, WSDL, SOAP, REST, JAX-RS, JERSEY, JAX-WS, Web Logic server 10.3.3, JMS, ITEXT, Eclipse, JUNIT, Star Team, JNDI, Spring framework - DI, AOP, Batch, Hibernate.
Confidential
Java Developer
Responsibilities:
- Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
- Used SOAP/ REST for the data exchange between the backend and user interface.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes.
- Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
- Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
- Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
- Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
- Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
- Developed authentication through LDAP by JNDI.
- Developed and debugged the application using Eclipse IDE.
- Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
- Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix.
- Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
- Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.
Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.