- Over 8 years of experience spread across Big Data, SCALA, Java and Python that includes extensive work on Big Data Technologies along with development of web applications in multi - tiered environment using Confluent Kafka 4.0, Hadoop 2.0, Hive1.2.2, Pig 0.16, Spark 2.1.1, Scala, Java, Spring Framework 5.0,MongoDB, Sqoop 1.4.6.
- Worked for 4 years with BigData/Hadoop Ecosystem in the implementation of DataLake.
- Hands on experience Hadoop framework and its ecosystem like Distributed file system (HDFS), MapReduce, Pig, Hive, Sqoop,Flume, Spark and Cassandra.
- Experience in layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows), extending the functionality by writing custom UDFs.
- Extensive experience in developing Data warehouse applications using Hadoop, Informatica, Oracle, Teradata, MS SQL server on UNIX and Windows platforms and experience in creating complex mappings using various transformations and developing strategies for Extraction, Transformation and Loading (ETL) mechanism by using Informatica 9.x/8.x.
- Proficient in Hive Query language and experienced in hive performance optimization using Static-Partitioning, Dynamic-Partitioning, Bucketing and Parallel Execution concepts.
- As Data Architect designed and maintained high performance ELT/ETL processes.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java, custom UDF s.
- Good Understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Knowledge on Cloud computing infrastructure AWS (amazon web services).
- Created modules for spark streaming in data into Data Lake using Strom and Spark.
- Experience in Dimensional Data Modeling Star Schema, Snow-Flake Schema, Fact and Dimensional Tables,concepts like Lambda Architecture, and Batch processing,Oozie.
- Extensively used Informatica client tools Source Analyzer, Warehouse designer, Mapping designer, Mapplet Designer, ETL Transformations, Informatica Repository Manager and Informatica Server Manager, Workflow Manager & Workflow Monitor.
- Expertise in using core Java, J2EE, Multithreading, JDBC, Shell Scripting and proficient in using Java API's Collections, Servlets, JSP for application development.
- Worked closely to review pre- and post-processed data to ensure data accuracy and integrity with Dev and QA teams.
- Experience in Java, J2ee, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JSON, XML, REST, SOAP Web services, Groovy, MVC, Eclipse, Weblogic, Websphere, and Apache Tomcat severs.
- Extensive knowledge of Data Modeling, Data Conversions, Data integration and Data Migration with specialization in Informatica Power Center.
- Expertise in extraction, transformation and loading data from heterogeneous systems like flat files, excel, Oracle, Teradata, MSSQL Server.
- Good work experience with UNIX/Linux commands, scripting and deploying the applications on the servers.
- Strong skills in algorithms, data structures, Object oriented design, Design patterns, documentation and QA/testing.
- Experienced in working as part of fast paced Agile Teams, exposure to testing in scrum teams.
- Excellent domain knowledge in Insurance, Telecom and Banking/Finance.
BigData Technologies: Hortonworks HDP, Hadoop,Mapreduce, Pig, Hive, Apache Spark, SQL, Informatica Power Center 9.6.1/8.x, Hbase/Cassandra, Kafka, Kibana, Storm,NoSQL, Elastic Mapreduce(EMR), Tez, Impala, Hue,YARN,Mesos.
Databases: Hortonworks HDP, Oracle 10g/11g, Teradata, DB2,Microsoft SQL Server, MySQL, MongoDB, noSQL,SQL databases.
Platforms (O/S): Red-Hat LINUX, Ubuntu, Windows NT/2000/XP.
Programming languages: Java, Scala, SQL, UNIX shell script, JDBC, Python, Perl.
Security Management: Hortonworks Ambari, Cloudera Manager, Apache Knox, XA Secure, Kerberos .
Data warehousing: Informatica Powercenter/Powermart/Dataquality/Bigdata, Pentaho, ETL Development, Amazon Redshift, IDQ.
Database Tools: JDBC, HADOOP, Hive, No-SQL, SQL Navigator, SQL Developer, TOAD, SQL Plus, SAP Business Objects
Data Modeling: Rational Rose, Erwin 7.3/7.1/4.1/4.0
Code Editors: Eclipse, Intellij, Spark Eclipse
Confidential, Tampa, FL
Big Data Developer
- Prepared ETL design document which consists of the database structure, change data capture, Error handling, restart and refresh strategies.
- Created mapping documents to outline data flow from sources to targets
- Worked with different feeds data like JSON, CSV, XML,DAT and implemented Data Lake concept.
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Developed Informatica design mappings using various transformations.
- Maintained end to end ownership for analyzed data, developed framework’s, Implementation building and communication of a range of customer analytics projects.
- Good exposure to IRI end-end analytics service engine, new big data platform (Hadoop loader framework, Big data Spark framework etc.)
- Most of the infrastructure is on AWS, used,
- AWS EMR Distribution for Hadoop
- AWS S3 for raw file storage
- AWS EC2 for Kafka
- Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store
- Programmed ETL functions between Oracle and Amazon Redshift
- Used Kafka producer to ingest the raw data into Kafka topics run the Spark Streaming app to process clickstream events.
- Performed data analysis and predictive data modeling.
- Explore clickstream events data with SparkSQL.
- Optimized the configuration of Amazon Redshift clusters, data distribution, and data processing .
- Architecture and Hands-on production implementation of the big data MapR Hadoop solution for Digital Media Marketing using Telecom Data, Shipment Data, Point of Sale (POS), exposure and advertising data related to Consumer Product Goods.
- Spark SQL is used as a part of Apache Spark big data framework for structured, Shipment, POS, Consumer, Household, Individual digital impressions, Household TV impressions data processing.
- Created DataFrames from different data sources like Existing RDDs, Structured data files, JSON Datasets, Hive tables, External databases
- Load terabytes of different level raw data into Spark RDD for data Computation to generate the Output response.
- Import the data from HDFS into Spark RDD
- Used Hive Context which provides a superset of the functionality provided by SQLContext and Preferred to write queries using the HiveQL parser to read data from Hive tables (fact, syndicate).
- Modeled Hive partitions extensively for data separation and faster data processing and followed Hive best practices for tuning.
- Caching of RDDs for better performance and performing actions on each RDD.
- Created Hive Fact tables on top of raw data from different retailer’s which indeed partitioned by IRI Time dimension key, Retailer name, Data supplier name which further processed pulled by analytics service engine.
- Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.
- Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
- Current oversight includes the migration of major data assets from IRI’s Oracle Exadata Data store to MapR Hadoop platform.
- Leadership of a major new initiative focused on Media Analytics and Forecasting will have the ability to deliver the sales lift associated the customer marketing campaign initiatives.
- Responsibility includes platform specification and redesign of load processes as well as projections of future platform growth.
- Managing on-shore & off-shore development team - Project Planning, Work Breakdown, Budgeting etc.,
- Coordinating the QA, PROD environments deployments.
- Python was used in automation of Hive and Reading Configuration files.
- Involved in Spark for fast processing of data. Used both Spark Shell and Spark Standalone cluster.
- Using Hive to analyze the partitioned data and compute various metrics for reporting.
Environment: Hadoop, Map Reduce, HDFS, Hive, Python, Scala, Kafka, Spark streaming, Spark Sql, MongoDB ETL, Oracle, Informatica 9.6,SQL, MapR, Sqoop, Zookeeper, AWS EMR,AWS S3,AWS EC2, Control-M scheduler, D3.JS,Jenkins, GIT, JIRA, Unix/Linux, Agile Methodology, Scrum.
Confidential, Austin, TX
Big Data/Hadoop Developer
- Understand the requirements and prepared architecture document for the Big Data project.
- Worked with HortonWorks distribution
- Supported MapReduce Java Programs those are running on the cluster.
- Optimized Amazon Redshift clusters, Apache Hadoop clusters, data distribution, and data processing
- Developed MapReduce programs to process the Avro files and to get the results by performing some calculations on data and also performed map side joins.
- Imported Bulk Data into HBase Using MapReduce programs.
- Programmed ETL functions between Oracle and Amazon Redshift.
- Used Rest ApI to Access HBase data to perform analytics.
- Perform analytics on Time Series Data exists in Cassandra using Cassandra API.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in MapReduce way
- Involved in collecting, aggregating and moving data from servers to HDFS using Flume.
- Imported and Exported Data from Different Relational Data Sources like DB2,SQL Server, Teradata to HDFS using Sqoop.
- Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
- Experienced in collecting the real-time data from Kafka using Spark Streaming and perform transformations and aggregation on the fly to build the common learner data model and persists the data into Hbase.
- Worked on POC for IOT devices data, with spark.
- Used SCALA to store streaming data to HDFS and to implement Spark for faster processing of data.
- Worked on creating the RDD's, DF's for the required input data and performed the data transformations using Spark Python.
- Involved in developing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed PIG scripts for the analysis of semi structured data.
- Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Worked on Oozie workflow engine for job scheduling.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Experienced in managing and reviewing the Hadoop log files using Shell scripts.
- Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Used Amazon Web Services S3 to store large amount of data in identical/similar repository.
- Involved in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
- Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
- Responsible for preparing technical specifications, analyzing functional Specs, development and maintenance of code.
- Worked with the Data Science team to gather requirements for various data mining projects
- Written shell scripts for rolling day-to-day processes and it is automated
Confidential, Dallas, TX
Big Data/Hadoop Developer
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Loaded datasets from Teradata to HDFS and Hive on a daily basis.
- Developed complex Mapreduce streaming jobs using Java that were implemented Using Hive and Pig.
- Optimized Mapreduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop
- Analyzed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behavior.
- Used Impala to query the Hadoop data stored in HDFS.
- Worked on NoSQL database including HBase.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Experience in writing MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
- Interacted with clients and business users. Involved in requirement gathering and impact analysis.
- Monthly Billing Process (MBP) application runs every month which includes processes for creating customer invoices and usage reports.
- Fiberlink invoices customers for Connectivity Charges (Dial, Wi-Fi and Broadband), Software License Fee (E360 and Third Party Apps), Monthly Recurring Charges (Custom Reports, Managed Services).
- It consists of a set of automated and manual steps managed by the billing team and Finance team.
- Member of a development team responsible for design, development and testing of server-based software that provides secure mobile workforce solutions.
- Created technical design draft documentation.
- Enrollment is a process of registering a handheld device with the portal and Management of a device includes collecting information about a device.
- Enrollment part of the application has ability to notify users using SMS and email.
- These requests are used by end-users to register their devices with portal so that they can be managed by the administrator, Also there are few usability related things like QRcode integration in the email's sent to the user so that user can easily register the device without much effort.
- Created conceptual, logical and physical data models.
- Written Procedures, Functions & Triggers for different operations.
- Performance tuning of query and scripts.
- Used BCP to import data to stage tables.
- Post production User support.
- Coordinated with the Quality Assurance/Testing team members to perform both SIT and UAT testing.
Java- Designer and Developer
Environment: Java, J2EE, Struts, JSP, Servlets, MS-SQL Server, Oracle-9i, Windows, UNIX.
- As a Software Engineer, involved in designing business layer and data management components using MVC frameworks such as Struts and Java/J2EE.
- Requirement Analysis for the enhancements of the application.
- Identify other source systems like Oracle, their connectivity, related tables and fields and ensure data integration of the job.
- Preparation of project closure reports.
- Writing the Junit test cases to all the components in the product.
Designing and Developer
- Requirement analysis.
- Research and Development of this Confidential management system.
- Designing and developing the modules for writing test cases.
Technologies Used: Java, J2EE, Struts, MS-SQL Server.
Tools: Eclipse IDE.