- Over 8+ years of professional IT experience with Big Data Ecosystem experience in ingestion, storage, querying, processing and analysis of big data.
- Mainly focused on big data with BI/Data warehouse projects.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, YARN, Hive, Sqoop, Pig, Oozie, AWS, Cloudera and Horton Works.
- Good Understanding of Hadoop architecture and Hands - on experience with Hadoop components such as JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts and HDFS Framework.
- Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX.
- Performed hive tuning activities and worked on oozie, sqoop to great extent.
- Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data modeling, physical & logical data modeling.
- Worked on hybrid role as big data developer and Jasper soft reporting developer.
- Good knowledge in different working strategies like Agile, Waterfall and Scrum methodologies.
- Worked on POC with NoSQL databases like Hbase and Cassandra for latency issue.
- Involved in the Coreteam for selecting the reporting tool and did analysis on new emerging tools like qliksense, Datameer, Platfora.
- Experience in writing Shell scripts using bash for process automation of databases, applications, backup and scheduling.
- Familiarity on real time streaming data with Spark, Kafka.
- Strong analytical skills with ability to quickly understand clients business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
BigData Technologies: Hadoop,HDFS,YARN,Hive,MapReduce,Pig,Sqoop,Oozie,EC2, S3,EMR,OFSAA,Redshift
Distributions: Cloudera, AWS, Horthonworks
Reporting Tools: Jaspersoft, Qliksense, Tableau, MicroStrategy
Scripting Languages: Python, Shell
Programming Languages: C, C++, Java
Application Server: Weblogic Server, Apache Tomcat.
DB Languages: SQL, PL/SQL, Postgres, Paraccel
Databases /ETL: Oracle 9i/10g/11g, MySQL 5.2, DB2, Informatica v 8.x
NoSQL Databases: Hbase, Cassandra, Mongo DB.
Operating Systems: Linux, UNIX, Windows 2003 Server
Confidential, Tampa, FL
Sr. Big Data Developer
- Worked on dual projects and maintaining onshore/offshore teams
- Involved extensively on design and architecture lay out.
- Involved in the technology and architectural meetings across all regions and provided inputs to other teams.
- Engage with business analysts to understand business requirements and translate them to functional specification and technical design, ensuring full requirements traceability.
- Converted OFSAA plus Autosys jobs with Oozie workflow and trained the teams.
- Extensively worked on hive tuning for performance improvement.
- Worked with teams spread across many countries and time zones.
- Experienced in handling different data formats in Hive using Hive SerDe's and compression techniques.
- Extensively worked on Hive tables, partitions and buckets for analyzing large volumes of data.
- Extensively worked on oozie and even stepped into admin role for further analysis and helped teammates for trouble shooting issues.
- Created UDF for hive/pig/MR jobs and wrote standalone java jobs.
- Implemented data lake and along with pig/hive scripts for data processing and sqoop/talend for data ingestion.
- Worked with Spark SQL, spark streaming on a POC for real-time processing.
- Create process to automate the data ingestion, data analysis and validation after the ingestion.
- Involved for data migration from dev, uat, sit, prod and supported deployments.
- Created Data marts for analytical teams and work closely with business owners/end customers.
- Created dashboard reports using tableau/jaspersoft.
- Held Point of contact role for new source ingestion along with designing and planning for the workflow.
Environment: Cloudera Manager, hive, pig, mapreduce, oozie, ofsaa, autosys, sqoop, talend, jaspersoft, tableau, zeppelin, Tortoise, Spark SQL
Confidential, Atlanta, GA
Sr. Big Data Engineer
- Worked on hybrid role as big data developer and reporting developer.
- Held Technical Senior resource and part of Big Data Center of Excellence (COE) for creating technical guidance, road map and strategies in delivering various big data solutions throughout the Organization.
- Wrote technical design document, deployment document, supporting documents.
- Suggested to reduce project cost and worked on spike for moving from Cloudera to Amazon.
- Initiated agonistic data ingestion concept, which does automated scripts based on the schema changes.
- Provided support during deployments and helped admin team in end point changes.
- Daily status check for Oozie workflow and implement necessary changes incase of tweak’s and monitor Cloudera manager and check data node status to ensure nodes are up and running.
- Wrote java stand alone and java MR jobs.
- Experienced in handling different data formats like json in Hive using Hive SerDe's.
- Experienced with different compression techniques like LZO, GZip and Snappy.
- Extensively worked on Hive tables, partitions and buckets for analyzing large volumes of data
- Worked on Map Reduce jobs on XML, JSON, CSV data format using serdes.
- Used Pig/hive as ELT tool to do transformations, joins and aggregations before storing the data into HDFS
- Developed shell scripts for adding process dates to the source files
- Implemented the workflows using Oozie, cronjob.
- Converted Impala scripts into self-service where analytics team can pull the data.
- Lead teammates for converting business onetime reports to self-services solutions and automated reports.
- Worked with Project Managers to ingest data into hadoop ecosystems from all possible sources within the organization and laid platform for analytics/R team.
- Performed hive tuning activities
- Worked on data lake concepts, converted all ETL jobs into pig/hive scripts.
- Reverse engineered the reports and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for new enhancements of reports
- Creating Dashboards/Reports/Domain/Adhoc View and built Data visualizations using Jaspersoft and provide analysis on the data.
- Implemented Spark job to improve query performance.
- Suggested improvement processes for all process automation scripts and tasks.
- Involved in the Coreteam for selecting the reporting tool and did analysis on new emerging tools like qliksense, jetro data.
- Involved in the technology and architectural meetings across all teams and provided inputs to other teams.
- Involved in grooming sessions with Project Manager/Scrum Master/Technology Director.
- Extensively helped stakeholders and guided them for revenue benefits and business model implementations with partners
Environment: Cloudera,MapReduce,HDFS,Pig,Hive,Sqoop,Oozie,Postgres,AWS,S3,Redshift,EC2,EMR,jaspersoft studio and server,tableau,Source tree,stash,Intellij,JIRA,Horthonworks
Confidential, New York City, NY
Sr. Big Data Engineer
- Experience in working with Sqoop for importing and exporting data between HDFS and RDBMS systems.
- Designed a data warehouse using Hive. Created partitioned tables in Hive.
- Developed the Hive UDF'S to pre-process the data for analysis.
- Analyzed the data by performing Hive queries and running Pig scripts to know Artist behavior
- Worked on historical load for the various feeds.
- Implemented data lake, converted all ETL jobs into pig/hive scripts.
- Created and maintained the Data Model repository as per company standards.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS
- Worked on oozie workflow, cron job.
- Developed workflow in Amazon Datapipeline to automate the tasks of loading the data into S3 and pre-processing with Pig/Hive.
- Worked with Tableau team in creating Dashboards and built Data visualizations using Tableau and provide analysis on the data.
- Exported analyzed data to S3 using Sqoop for generating reports.
- Extensively used Pig/Hive for data cleansing. Developed Pig Latin/Hive scripts to extract the data from the web server output files to load into S3.
- Suggested improvement processes for all process automation scripts and tasks.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Used Impala extensively for performance upgrade for hive scripts.
- Participated in evaluation and selection of new technologies to support system efficiency
- Assisted team mates in creation of ETL processes for transformation of data sources from existing system.
- Worked with analyst and test team for writing Hive Queries.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in S3.
Environment: Hadoop 1.2,MapReduce,HDFS,Pig,Hive,Sqoop,AWS,S3,Redshift,Paraccel,EC2,EMR, Jaspersoft, Tableau
Confidential, Atlanta, GA
Sr. Hadoop Developer
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
- Worked on the proof-of-concept for Apache Hadoop framework initiation.
- Experience in HDFS, MapReduce and Hadoop Framework
- Trained and guided the team on Hadoop framework, HDFS, MapReduce concepts.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, Sqoop.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Loaded large amount of Application Server logs, MLM data into Cassandra using Sqoop
- Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased products.
- Exported analyzed data to Oracle database using Sqoop for generating reports.
- Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Developed Hive queries for the analysts.
- Cluster co-ordination services through ZooKeeper
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS/Cassandra cluster.
Environment: Mapreduce, Hive, Pig, Sqoop, Oracle, Cassandra, Cloudera Manager, ZooKeeper.
Confidential, Houston, TX
- Responsible for loading the customer’s data and event logs from MSMQ into HBase using REST API.
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Flume.
- Used Hive to find correlations between customer’s browser logs in different sites and analyzed them to build risk profile for such sites.
- End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
- Developed the Pig UDF’S to pre-process the data for analysis
- Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
Environment: Cloudera Distribution, CDH4, FLUME, HBase, HDFS, Pig, MapReduce, Hive
Confidential, Columbus, Ohio
JAVA/ SQL Developer
- Involved in requirement gathering, functional and technical specifications.
- Enhancements in the self-registration process.
- Fixing the existing bugs in various releases
- Global deployment of the application and co-ordination between the client, development team and the end users.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Used Java Collection Classes like Array List, Vectors, Hash Map and Hash Table.
- Used Design Patterns MVC, Singleton, Factory, Abstract Factory.
- Developed algorithms and coded programs in Java.
- Co-ordinate with different IT groups and Customer.
- Performed all types of testing includes Unit testing, Integration and testing environments.
Environment: JAVA, STL's, Design Patterns, Oracle, SQL/ PL SQL.
Java Developer and ETL Developer
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web-services using SOAP and WSDL.
- Involved in database design.
- Created User Interface using JSF.
- CVS tool is used for version control of code and project documents.
- Implemented History Load, Incremental Load and ETL logic using Informatica 8.6/9.1 as per the ETL design document and Technical design document.
- Familiar with ETL Standards and Process and developed ETL logic as per standards from Source-Flat File, Flat-File-Stage, Stage-Work, Work-Work Interim tables and Work Interim tables- Target Tables.
- Prepared ETL Scripts for Data acquisition and Transformation. Developed the various mappings using transformation like source qualifier, joiner, filter, router, Expression and lookup transformations etc.
- Doing Analysis for existing ETL Jobs and understanding the flow.
Java/ UI Developer
- Involved in the design, coding, deployment and maintenance of the project.
- Involved in design and implementation of web tier using Servlets and JSP.
- Performed client side validations using Java Script.
- Used Apache POI for Excel files reading.
- Written build scripts with Ant for deploying war and ear applications.
- Configured connection pools and establishes a connection with MySQL.
- Involved in JUnit testing of the application using JUnit framework.
- Worked on front end enhancements.
Environment: Java, J2EE, Tomcat, MySQL, Eclipse, Apache POI, Java Script, CSS, HTML.