- 9+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in Big Data, Hadoop Development and Ecosystem Analytics, Development and Design of Java based enterprise applications.
- Around 5 years of experiences in Hadoop, Eco - system components HDFS, MapReduce (MRV1, YARN), Pig, Hive, HBase, Scoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Scala and exposure to Cassandra.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Expertise in Data Development in Hortonworks HDP platform &Hadoop ecosystem tools like Hadoop, HDFS, Spark, Zeppelin, Hive, HBase, SQOOP, flume, Atlas, SOLR, Pig, Falcon, Oozie, Hue, Tez, ApacheNiFi, Kafka.
- Have very good experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and Hbase
- Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Strong knowledge on creating and monitoring Hadoop clusters onAmazon EC2, VM, HortonworksData Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager on Linux, Ubuntu OS etc.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
- Good knowledge of Hadoop Architecture and various components such as YARN, HDFS, NodeManager, ResourceManager, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Experience in installation, configuration, supporting and managing - Hortonworks/Cloudera's Hadoop platform along with CDH3&4 clusters.
- Solid SQL skills, can write complex SQL queries; functions, triggers and stored procedures for Backend testing, Database Testing and End-to-End testing.
- Experienced on Hadoop cluster on Azure HD Insight Platform and deployed Data analytic solutions using tools like Spark and BI reporting tools.
- Very Good understanding of SQL, ETL and Data Warehousing Technologies and Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
- Expertise in using Kafka as a messaging system to implement real-time Streaming solutions and implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Experience in migrating ETL process into Hadoop, Designing Hive data model and wrote Pig Latin scripts to load data into Hadoop.
- Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS and Amazon EC2, Amazon EMR.
- Good Knowledge on Object Oriented Analysis and Design (OOAD) and Java Design patterns and good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Pig, Impala, Oozie, Kafka, Spark, Zookeeper, Storm, Yarn, AWS.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans IDE's: Eclipse, Net beans, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring
Databases: Oracle MySQL, DB2, Teradata, MS-SQL Server.
NoSQL Databases: : Hbase, Cassandra, MongoDB
Web Servers: : Web Logic, Web Sphere, Apache Tomcat
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, Talend
Web Development: HTML, DHTML, XHTML, CSS, Java Script, AJAX
XML/Web Services: XML, XSD, WSDL, SOAP, Apache Axis, DOM, SAX, JAXP, JAXB, XMLBeans.
Methodologies/Design Patterns: OOAD, OOP, UML, MVC2, DAO, Factory pattern, Session Facade
Operating Systems: Windows, AIX, Sun Solaris, HP-UX.
Confidential, PORTLAND, OR
SR. BIGDATA ENGINEER
- Developed Pig scripts to help perform analytics on JSON and XML data and created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
- Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting and performed data transformations by writing MapReduce and Pig jobs as per business requirements
- Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for analysis and used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS
- Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
- Enable and configure Hadoop services such as HDFS, YARN, Hive, Ranger, Hbase, Kafka, Sqoop, Zeppeline Notebook and Spark/Spark2 and involved in analyzing log data to predict the errors by using Apache Spark.
- Evaluate deep learning algorithms for text summarization using Python, Keras, TensorFlow and Theano on Cloudera Hadoop system
- Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
- Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping and involved in analyzing log data to predict the errors by using Apache Spark.
- Used Sqoop to ingest from DBMS and Python to ingest logs from client data centers. Develop Python and bash scripts for automation and implemented Map Reduce jobs using Java API and Python using Spark
- Imported data from RDBMS systems like MySQL into HDFS using Sqoop and developed Sqoop jobs to perform incremental imports into Hive tables.
- Involved in loading and transforming of large sets of structured and semi structured data and created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
- Used Impala and Written Queries for fetching Data from Hive tables and developed Several MapReduce jobs using Java API.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements and developed Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing the data.
- Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
- Created Hive tables, loaded data and wrote Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Python, SparkSQL, Machine Learning, MongoDB, AWS, AWS S3, AWS EC2, AWS EMR, Oozie, ETL, Tableau, Spark, Spark-Streaming, KAFKA, Apache Solr, Cassandra, Cloudera Distribution, Java, Impala, Web Server's, Maven Build, MySQL, AWS, Agile-Scrum.
Confidential, CHICAGO, IL
SR. BIGDATA ENGINEER
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
- Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
- Worked on Go-cd (ci/cd tool) to deploy application and have experience with Munin frame work for BigData Testing.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS and converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
- Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations and imported data from different sources into Spark RDD for processing and developed custom aggregate functions using Spark SQL and performed interactive querying
- Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala and involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
- Imported the data from different sources like AWS S3, Local file system into Spark RDD.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persistence the data in HDFS.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis and worked on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
- Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket
- Worked with different file formats like JSon, AVRO and parquet and compression techniques like snappy and developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
- Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
- Worked with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake .
- Extensively used Stash Git-Bucket for Code Control and Worked on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena and Snow-Flake.
Environment: Spark, AWS, EC2, EMR, Hive, SQL Workbench, Genie Logs, Kibana, Sqoop, Spark SQL, Spark Streaming, Scala, Python, Hadoop (Cloudera Stack), Hue, Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Oracle, ETL, AWS S3, AWS EMR, GIT.
Confidential, CINCINNATI, OH
- Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using Spark.
- Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
- Delivery experience on major Hadoop ecosystem Components such as Pig, Hive, Spark Kafka, Elastic Search &HBase and monitoring with Cloudera Manager.
- Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Implemented the Machine learning algorithms using Spark with Python and worked on Spark Storm, Apache and Apex and python.
- Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
- Involved in configuring batch job to perform ingestion of the source files in to the Data Lake and developed Pig queries to load data to HBase
- Leveraged Hive queries to create ORC tables and developed HIVE scripts for analyst requirements for analysis.
- Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis and worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and parsed high-level design spec to simple ETL coding and mapping standards.
- Created and altered HBase tables on top of data residing in Data Lake and Created external Hive tables on the Blobs to showcase the data to the Hive Meta Store.
- Involved in requirement and design phase to implement Streaming Architecture to use real time streaming using Spark and Kafka.
- Use Spark API for Machine learning. Translate a predictive model from SAS code to Spark and used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Created Reports with different Selection Criteria from Hive Tables on the data residing in Data Lake.
- Worked on Hadoop Architecture and various components such as YARN, HDFS, NodeManager, ResourceManager, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Deployed Hadoop components on the Cluster like Hive, HBase, Spark, Scala and others with respect to the requirement.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
- Implemented the Business Rules in Spark/ SCALA to get the business logic in place to run the Rating Engine.
- Used Spark UI to observe the running of a submitted Spark Job at the node level and used Spark to do Property Bag Parsing of the data to get the required fields of data.
- Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
- Used both Hive context as well as SQL context of Spark to do the initial testing of the Spark job and used WINSCP and FTP to view the data storage structure in the server and to upload JARs which were used to do the Spark Submit.
- Developed code from scratch in Spark using SCALA according to the technical requirements.
Environmen t: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Sqoop, Spark, Scala, MapR, Core Java, R Language, SQL, Python, Eclipse, Linux, Unix, HDFS, Map Reduce, Impala, Cloudera, SQOOP, Kafka, Apache Cassandra, Oozie, Impala, Zookeeper, MySQL, Eclipse, PL/SQL
Confidential, DALLAS, TX
SR. JAVA DEVELOPER
- Involved in the design and development phases of Agile Software Development and analyzed current Mainframe system and designed new GUI screens.
- Developed the application using 3 Tier Architecture i.e. Presentation, Business and Data Integration layers in accordance with the customer/client standards.
- Played a vital role in Scala framework for web based applications and used Filenet for Content Management and for streamlining Business Processes.
- Created Responsive Layouts for multiple devices and platforms using foundation framework and implemented printable chart report using HTML, CSS and jQuery.
- Created Managed Beans for handling JSF pages and include logic for processing of the data on the page and Created simple user interface for application's configuration system using MVC design patterns and swing framework.
- Used Object/Relational mapping tool Hibernate to achieve object to database table persistency.
- Developed web GUI involving HTML, Java Script under MVC architecture and creation of WebLogic domains and setup Admin & Managed servers for JAVA/J2EE applications on Non Production and Production environments.
- Configured the Web sphere application server to connect with DB2, Oracle and SQL Server in the back end by creating JDBC data source and configured MQ Series with IBM RAD and WAS to create new connection factories and queues.
- Extensively worked on TOAD for interacting with data base, developing the stored procedures and promoting SQL changes to QA and Production Environments.
- Used Apache Maven for project management and building the application and CVS was used for project management and version management.
- Involved in the configuration of Spring Framework and Hibernate mapping tool and monitoring WebLogic/JBoss Server health and security.
- Creation of Connection Pools, Data Sources in WebLogic console and implemented Hibernate for Database Transactions on DB2.
- Involved in configuring hibernate to access database and retrieve data from the database and written Web Services (JAX-WS) for external system via SOAP/HTTP call.
- Used Log4j framework to log/track application and involved in developing SQL queries, stored procedures, and functions.
- Creating and updating existing build scripts using Ant for deployment Tested and implemented/deployed application on WAS server and used Rational Clear Case for Version Control.
- Involved in gathering and analyzing system requirements and played key role in the high-level design for the implementation of this application.
- Mavenized the existing applications using Maven tool and added the required jar files to the application as dependencies to the pom.XML file and used JSF & Struts frameworks to interact with the front end.
- Utilized Swing/JFC framework to develop client side components and developed J2EE components on Eclipse IDE.
- Developed a new CR screen from the existing screen for the LTL loads (Low Truck Load) using JSF.
- Used spring framework configuration files to manage objects and to achieve dependency injection.
- Implemented cross cutting concerns like logging and monitoring mechanism using Spring AOP.
- Implemented SOA architecture with web services using SOAP, WSDL, UDDI and XML and made screen changes to the existing screen for the LTL (Low Truck Load) Accessories using Struts.
- Developed desktop interface using Java Swing for maintaining and tracking products.
- Used JAX-WS to access the external web services, get the xml response and convert it back to java objects.
- Developed the application using Eclipse IDE and worked under Agile Environment and worked with Web admin and the admin team to configure the application on development, training, test and stress environments (Web logic server).
- Build PL\SQL functions, stored procedures, views and configured Oracle Database with JNDI data source with connection pooling enabled.
- Used Hibernate based persistence classes at data access tier and adopted J2EE design patterns like Service Locator, Session Facade and Singleton.
- Worked on Spring Core layer, Spring ORM, Spring AOP in developing the application components.
- Modified web pages using JSP and Used Struts Validation Framework for form input validation.
- Created the WSDL and used Apache Axis 2.0 for publishing the WSDL and creating PDF files for storing the data required for module.
- Used custom components using JSTL tags and Tag libraries implementing struts and used Web Logic server for deploying the war files and used Toad for the DB2 database changes.
ENVIRONMENT : Java, J2EE, JSF, Hibernate, Struts, Spring, Swing/JFC, JSP, HTML, XML, Web Logic, iText, DB2, Eclipse IDE, SOAP, Maven, JSTL, TOAD, DB2, JDK, Web Logic Server, WSDL, JAX-WS, Apache Axis.