- 8 Years of extensive experience including 4 years of Big Data and on E - Commerce, Healthcare domains and 4 years on Software Development Experience in ETL Informatica.
- Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, SQOOP, and Impala.
- Having hands on experience in writing Map Reduce jobs in Hive, Pig.
- Having experience on importing and exporting data from different systems to Hadoop file system using
- SQOOP. Using Hadoop ecosystem components for storage and processing data.
- Having experience on creating databases, tables, and views in HIVEQL, IMPALA and PIG LATIN.
- Strong knowledge on Map Reduce concepts
- Around 1year experience on Spark and Scala.
- Hands on Experience in working with ecosystems like Hive, Pig, Map Reduce.
- Strong Knowledge of Hadoop, Hive, and Hive analytical functions.
- Efficient in building map reduce programs using Hive and Pig.
- Involved in data migration to implement on Hadoop stack from different databases (SQL Server2008 R2, Oracle, and MYSQL).
- Successfully loaded files to Hive and HDFS from MYSQL.
- Loaded the dataset into Hive for ETL Operations.
- Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Strong Communication skills of written, oral, interpersonal and presentation.
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Extensive work experience with different SDLC approaches such as Waterfall and Agile development methodologies.
- Good communication and presentation skills.
- Ability to identify and resolve problems both independently and quickly.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Analyzing/Transforming data with Hive and Pig.
Skills: APACHE HADOOP HDFS, Hadoop, Hadoop Distributed File System, Oracle, SQL, Data warehouse, Informatica, unix
Big Data Hadoop Skills: HDFS, YARN, SQOOP, Flume, PIG and Hive SPARK: Spark Core, Spark Streaming, Spark SQL, NoSQL HBase.
Programming Language: Java
Analytics Tools: Informatica, RDBMS, Oracle
Confidential - Reston, VA
- Analyze large datasets to provide strategic direction to the company.
- Involved in analyzing the system and business.
- Developed SQL statements to improve back-end communications.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Involved in writing Hive queries to load and process data in Hadoop File System.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Involved in working with Impala for data retrieval process.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
- Sentiment Analysis on reviews of the products on the client's website.
- Exported the resulted sentiment analysis data to Tableau for creating dashboards.
Environment: Cloudera, CDH4.3, Hadoop, Map Reduce, HDFS, Hive, MangoDB, SQOOP, MYSQL, SQL, Impala, Tableau.
Confidential - Boston, MA
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive, and Pig.
- Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Understand clearly the business requirements of the client with respect to the risk rating modules and report modules.
- Working in the Cluster Setup 2-node and 5-node clusters with CDH3 distribution.
- Involved in the data prediction analysis using K-Mean algorithm.
- Coordinate discussions with customer and functional team as may be required to get various inputs.
- Work closely with the technology counterparts in communicating the business requirements.
- Application design and database design.
- Technical design document preparation.
Environment: Java, Machine learning, Cloud Era, Apache Hadoop, HDFS, Hive, Pig, Apache Spark, Spark Streaming, Spark SQL, SCALA, Git.
Confidential - San Francisco, CA
- To lead the Big Data Analytics solution project to load the data from Source all through into Client's Modern Analytics Platform.
- Analyze and Ingest Policy, Claims, Billing and Agency Data in Client's Solution which is done through multiple stages.
- Written multiple Map Reduce programs to extract data for extraction, transformation, and aggregation from different sources having multiple file formats including XML, JSON, CSV &other compressed file formats.
- Assisted with data capacity planning and node forecasting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated the SQOOP jobs by scheduling in Oozie.
- Create Hive scripts to load data from one stage into another and implemented incremental load with the changed data architecture.
- The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Performed data analysis, queries on hive, pig on AMBARI(Hortonworks)
- Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
- Implemented Hive partitioning and bucketing to improve query performance in the Staging layer which is de-normalized form of the Analytics Model.
- Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
- Issued SQL queries via Impala to process the data stored in HDFS and HBASE.
- Plan and review the deliverables. Assist the team in their development & deployment activities.
- Involved in cluster setup meetings with the administration team.
Environment: Apache Hadoop 2.2.0, Hortonworks, MapReduce, Hive, Hbase, HDFS, PIG, Sqoop, Flume, Impala, Spark, Oozie, Kafka, MongoDB, UNIX, Shell Scripting, XML, JSON.
Confidential - MEMPHIS, TN
- Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
- Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from web logs and store in HDFS.
- Involved in developing Hive UDFs for the needed functionality.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with a Solar search engine.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used pig to do transformations, event joins filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like a spark.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Extending Hive functionality by writing custom UDFs.
- Experience in managing and reviewing Hadoop log files
- Developed data pipeline using Flume, Sqoop, pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in emitting processed data from Hadoop to relational databases and external file systems using Sqoop.
- Orchestrated hundreds of Sqoop scripts, pig scripts, Hive queries using Oo zie workflows and sub-workflows.
- Loaded cache data into HBase using Sqoop.
- Experience in custom Talend jobs to ingest, enrich and distribute data in MapR, Cloudera Hadoop ecosystem.
- Created lots of external tables on Hive pointed to HBase tables.
- Analyzed HBase data in Hive by creating externally partitioned and bucketed tables.
- Worked with cache data stored in Cassandra.
- Injected the data from External and Internal Flow Organizations.
- Used the external tables in Impala for data analysis.
- Supported MapReduce Programs those are running on the cluster.
- Participated in apache Spark POCS for analyzing the sales data based on several business factors
- Participated in daily scrum meetings and iterative development.
Environment: s: Hadoop, MapReduce, Hdfs, Pig, Hive, HBase, Impala, Sqoop, Oozie, Apache Spark, Java, Linux, SQL Server, Zookeeper, Tableau.
Confidential, Louisville, KY
- Developed ETL programs using Informatica to implement the business requirements.
- Communicated with business customers to discuss the issues and requirements.
- Created shell scripts to fine tune the ETL flow of the Informatica workflows.
- Used Informatica file watch events to pole the FTP sites for the external mainframe files.
- Production Support has been done to resolve the ongoing issues and troubleshoot the problems.
- Performance tuning was done at the functional level and map level. Used relational SQL wherever possible to minimize the data transfer over the network.
- Effectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections, and relational connections.
- Involved in enhancements and maintenance activities of the data warehouse including tuning, modifying of stored procedures for code enhancements.
- Effectively worked in Informatica version based environment and used deployment groups to migrate the objects.
- The used debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
- Pre-and post-session assignment variables were used to pass the variable values from one session to other.
- Designed workflows with many sessions with the decision, assignment task, event wait, and event raise tasks, used the informatic scheduler to schedule jobs.
- Reviewed and analyzed functional requirements, mapping documents, problem-solving and trouble shooting.
- Performed unit testing at various levels of the ETL and actively involved in team code reviews.
- Identified problems in existing production data and developed one-time scripts to correct them.
- Fixed the invalid mappings and troubleshoot the technical problems of the database.