Hadoop Developer/ Data Engineer Resume
Los Angeles, CA
SUMMARY
- Overall 10 years ' of professional IT experience with 5 years of experience in analysis, architectural design, prototyping, development, Integration and testing of applications using Java/J2EE Technologies and 6 years of experience in Big Data Analytic as Hadoop Developer.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Hands on experience in creating A pache Spark RDD transformations on Data sets in the Hadoop data lake.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytic.
- Hands on experience working on NoSQL databases including Hbase, Cassandra and its integration with Hadoop cluster.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Experienced in Data Ingestion projects to inject data into Data lake using multiple sources systems using Talend Bigdata.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloud-era and AWS.
- Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
- Hands on experience in solving software design issues by applying design patterns including Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern and Template Pattern
- Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS.
- Experience in analysis, design, development and integration using Bigdata - Hadoop Technology like MapReduce, Hive, Pig,Sqoop, Ozzie,Kafka Streaming, HBase, Azure, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, Java/J2EE, SQL.
- Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
- Experience in SCOPE language to communicate with COSMOS for data integrations.
- Experience with Creating, Automating and managing COSMOS/SCOPE jobs on the VC’s.
- Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
- Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
- Excellent and experience and knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system
- Datacenter Migartion, Azure Data Services have strong virtualization experience
- Experience in troubleshooting and resolving architecture problems including database and storage, network, security and applications
- Extensive experience in developing strategies for Extraction, Transformation and Loading data from various sources into Data Warehouse and Data Marts using DataStage.
- Having extensive experience in Data Integration and Migration using IBM Infosphere DataStage(9.1), Quality stage, SSIS, Oracle, Teradata, DB2, SQL and Shell script along with technical certifications in ETL development from IBM and Cloudera .
- Well exposure with functional point analysis while Estimation, Planning and Design in DataStage platform with Implementation.
- Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
- Experienced in scheduling sequence, parallel and server jobs using DataStage Director, UNIX scripts and scheduling tools. Designed and developed parallel jobs, server and sequence jobs using DataStage Designer.
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
- Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake(ADLS), Azure Data Factory(ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service(AAS), Azure Blob Storage, Azure Search, Azure App Service,Azure data Platform Services.
- Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure
- Worked on Big Data Cosmos as a source and building numerous reports on top of it using complex Scope queries and Power BI as reporting end for Universal Store BizOps team.
- Experience in Configured Hive meta store with MySQL, which stores the metadata for Hive tables. Extensive experience in creating data pipeline for Real Time Streaming applications using Kafka Streaming, Flume, Storm and Spark Streaming and analyze sentiment analysis for twitter source
- Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Worked on migrating the Power BI reports from Cosmos 11 to Cosmos 15.
- IBM ETL Talend Data Stage Developer with 8+ years in Information technology having worked in Design, Development, Administrator and Implementation of various database and data warehouse technologies (IBM Talend Enterprise edition and Data Stage v9.X/8.X/7.X) using components like Administrator, Manager, Designer and Director
- Fluent in Data Mining and Machine Learning, such as classification, clustering, regression and anomaly detection. In depth understand of Scalable Machine Learning libraries like Apache Mahout, MLlib.
- Implement AWS Data Lake leveraging S3, terraform, vagrant/vault, EC2, Lambda, VPC, and IAM in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization using Tableau Desktop/Server 10.4 and Alteryx.
- Good exposure of Web Services using CFX/ XFIRE and Apache Axis, for the exposure and consumption of SOAP Messages
- Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server, DB2. Experience in writing numerous test cases using JUnit framework with Selenium.
- Leverage AWS, Informatica Cloud, Snowflake Data Warehouse, Hashi corp Platform, AutoSys, and Rally Agile/SRUM to implement Data Lake, Enterprise Data Warehouse, and advanced data analytics solutions based on data collection and integration from multiple sources ( Salesforce, Salesconnect, S3, SQL Server, Oracle, NoSQL and Mainframe systems).
- Strong work ethic with desire to succeed and make significant contributions to the organization.Strong problem solving skills , good communication, interpersonal skills and a good team player.
- Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka Streaming, Scala, Oozie and Talend ETL .
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
- Strong understanding of data warehouse and data lake technology.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java . Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice - versa.
- Good Confidential developing Big data based Solutions using Hadoop and Spark, Information Retrieval and Machine Learning areas.
- Experienced in implementing Real-Time streaming and analytics using various technologies i.e. Spark Streaming and Kafka.
TECHNICAL SKILLS
Big Data Technologies: Hive, Hadoop, Map Reduce, Hdfs, Sqoop, R, Flume, Spark, Apache Kafka, Hbase, Pig, Elastic search, AWS, Oozie, Zookeeper, Apache hue, Apache Tez, YARN, Talend, Storm, Impala, Tableau and Qlikview.
Programming Languages: Java JDK1.4/1.5/1.6 JDK 5/JDK 6, C/C, Matlab, R, HTML, SQL, PL/SQL SQL, C, C++, Java, J2EE, Pig Latin, Hive, Scala, Java, Python, TSQL, Latin, HiveQ
Framework: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML
Operating Systems: UNIX, Windows, LINUX
Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere
Web technologies: JSP, Servlets, Socket Programming, JNDI, JDBC, Java Beans, JavaScript, Web Services JAX-WS
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 MySQL 4.x/5.x
Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0
Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Visio, Rational Rose, Datastage
PROFESSIONAL EXPERIENCE
Hadoop Developer/ Data Engineer
Confidential, Los Angeles CA
Responsibilities:
- Hands-on experience with Apache Spark and its components (Spark core and Spark SQL). Hands on experience in in-memory data processing with Apache Spark and Apache Nifi/Minifi.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in the Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing. Importing and exporting data into HDFS and Hive using SQOOP. Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
- Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
- Worked on Data lake store, Data lake analytics and on creating Data factory pipelines.
- Developed U-SQL Scripts for schematizing the data in Azure Data Lake Analytics.
- Worked on various ad-hoc requests for extracting big data from Cosmos thru Scope scripting in Visual studio 2012/2014.
- Big data aggregation using Cosmos scripts and sqlizing the data for weekly reporting.
- Developed data pulls from Cosmos using Scope Scripts.
- Worked with the Confidential COSMOS distributed systems technology. All the datasets related to PKRS & APS are moved to COSMOS.
- Extensively used Stream Sets Data Collector to create ETL pipeline for pulling the data from RDBMS system to HDFS.
- Experienced on Kafka Streaming using stream sets to process continuous integration of data from Oracle systems to hive warehouse.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
- Data analytics and engineering experience in multiple Azure platforms such as Azure SQL, Azure SQL Data warehouse, Azure Data Factory, Azure Storage Account etc. for source stream extraction, cleansing, consumption and publishing across multiple user bases.
- Involved in the data ingestion process through datastage to load data into HDFS from Mainframes, Greenplum,teradata, DB2.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala .
- Experienced in developing scripts for doing transformations using Scala .
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Developed user interfaces using AJAX, JavaScript, JSON, HTML5, and CSS3.
- Providing a responsive, AJAX-driven design using JavaScript libraries such as Angular.Js, Node.js, D3.js, Backbone.js, Bootstrap.js and Bootstrap.js
- Developing Web Application in Groovy/Grails with Mongo DB as a data stores using the IntelliJ Idea IDE with the latest Grails SDK, Java.
- Worked with Angular 2 and typescript as part of migration from Angular and vanilla javascript to Angular 2 and React.
- Worked with various MVC Java frameworks like Angular.JS, EXT.JS, Backbone.JS, Node.JS, Ember.JS, bootstrap.JS, Require .JS, D3.JS, etc
- Developed application as Enterprise JavaScript using - AngularJs, NodeJs, Websockets, Jasmine, Karma, NPM, Gulp, Protractor etc.
- Designed Frontend with object oriented JavaScript Framework like angular.js, Node.js, Backbone.js, Knockout.js, React.js/Redux, Spine.js, Ember.js, Require.js, Express.js, Pdf.js and Experience with client side templating like Handlebars.js
- Involved in the development of presentation layer and GUI framework using HTML. Client Side validations were done using JavaScript and AngularJs.
- Worked on Java/J2EE framework APIs like Spring, iBatis and Hibernate.
- Used Reactive Extensions for JavaScript (RxJS) in Angular2 to make the HTTP requests to the REST API for getting the patient details.
- Responsible for building scalable distributed data solutions using MongoDB and Cassandra.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Provided support to data analysts in running Pig and Hive queries. Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL). Applied transformations and filtered traffic using Pig.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster. Responsible for building scalable distributed data solutions on a cluster using Cloudera Distribution.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs. Setup and benchmarked Hadoop and HBase clusters for internal use.
- Responsible for building scalable distributed data solutions using Hadoop. Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting
Hadoop Developer/ Big Data Engineer
Confidential
Responsibilities:
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data- Confidential -Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
- Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion .
- Hands-on experience with IBM Big Data product offerings such as IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL .
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Experienced in creating data pipeline integrating kafka streaming with spark streaming application used scala for writing applications.
- Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines.
- Experienced in using debug mode of talend to debug a job to fix errors.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Worked Extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
- Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
- Experienced in scheduling sequence, parallel and server jobs using DataStage Director, UNIX scripts and scheduling tools. Designed and developed parallel jobs, server and sequence jobs using DataStage Designer.
- Worked on the Architecture of ETL process. Created DataStage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems like ODS, flat files, scheduled the same using DataStage Sequencer for SI testing.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBase through Sqoop and place in HDFS for further processing.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster. Involved in creating Hive tables, loading data and running hive queries on the data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Worked with NoSQL database, HBase to create tables and store data. Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased products on the website.
- Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
- Developed Kafka Streaming producer and Spark Streaming consumer to read the stream of events as per business rules. Loaded various formats of structured and unstructured data from Linux file system to HDFS.
- Utilized Tableau to visualize the analyzed data and performed report design and delivery. Created POC for Flume implementation. Worked on Linux/Unix
Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java
Hadoop Developer / Big Data Engineer
Confidential
Responsibilities:
- Used Rest ApI to Access HBase data to perform analytics. Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume. Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way. Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS .
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis. Designed and developed read lock capability in HDFS.
- Involved in End-to-End implementation of ETL logic. Involved in designing use-case diagrams, class diagram, interaction using UML model. Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
- Worked on Maven build tool. Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework. Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
- Good experience in Mule development. Developed Web applications with Rich Internet applications using Java applets, Silverlight, JavaFX. Involved in creating Database SQL and PL/SQL queries and stored Procedures.
- Implemented Singleton classes for property loading and static data from DB. Debugged and developed applications using Rational Application Developer (RAD). Developed a Web service to communicate with the database using SOAP.
- Developed DAO (Data Access Objects) using Spring Framework 3. Deployed the components in to WebSphere Application server 7. Actively involved in backend tuning SQL queries/DB script.
- Worked in writing commands using UNIX Shell scripting. Used java in removing an attribute in JSON file where Scala was not supporting to create objects and again converted to Scala. Worked on Java & Impala and master clean-up of data.
- Worked on accumulators to count the result after executing the job on multiple executors. Worked in intellij IDE for the development and debugging. Worked on Linux/Unix.
- Wrote a whole set of programs for one of the LOB's in Scala and made unit testing. Created many SQL schemas and utilized them throughout the program wherever required. Made enhancements to one of the LOBs using Scala programming.
- Ran spark-submit job and analyzed the log files. Used Maven to build .jar files, Used Sqoop to transfer data between relational databases and Hadoop.
- Worked on HDFS to store and access huge datasets within Hadoop, Good hands on experience with git and GitHub, Created a feature node on GitHub.
- Pushed the data GitHub and made a pull request, Experience in JSON and CFF.
- Extensive ETL tool experience using IBM Talend Enterprise edition, InfoSphere/WebSphere DataStage, Ascential DataStage, Bigdata Hadoop and SSIS. Worked on DataStage client tools like DataStage Designer, DataStage Director and DataStage Administrator.
- Around 3 years working experience in Talend(ETL Tool) to developing & leading the end to end implementation of Big Data projects, comprehensive experience as a Hadoop Developer in Hadoop Ecosystem like Hadoop,Map Reduce, Hadoop Distributed File System (HDFS), HIVE,IMPALA,Yarn,Ozie, Hue,Spark.
- Expertise on working with various databases in writing SQl queries, Stored Procedures, functions and Triggers by using PL\SQL and SQl.
- Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and Filo DB and its Integration with Hadoop cluster.
Environment: Java EE 6, IBM WebSphere Application Server 7, Apache-Struts 2.0, EJB 3, Spring 3.2, JSP 2.0, WebServices, JQuery 1.7, Servlet 3.0, Struts-Validator, Struts-Tiles, Tag Libraries, ANT 1.5, JDBC, Oracle 11g/SQL, JUNIT 3.8, CVS 1.2, Rational Clear Case, Eclipse 4.2, JSTL, DHTML.