- Over 7+ years of experience in the field of Big Data Engineering and Data Analysis with expertise in Big Data technologies.
- A results - driven individual with a passion for data / analytics who can work collaboratively with others to solve business problems that drive business growth.
- Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, and Zookeeper.
- Highly skilled in Big Data tools like Hadoop, Hive Pig, Spark, and skilled using Hadoop (Pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Strong knowledge of Spark for handling large data processing in streaming process along with Python.
- Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in teamwork.
- Experience in extracting data using technologies such as Python, R, SAS, Azure and SQL and hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
- Experience working in Agile - Scrum Software Development.
- Good understanding and exposure to Python programming.
- Extensive experience in Shell scripting.
- Ability to develop Map Reduceprogram using Java and Python.
- Strong knowledge and hands on experience on Data Visualization with Tableau in creating Bar charts, Pie charts, Dot charts, Boxplots, Subplots, Histograms, Error Bars, Multiple chart types, Time series etc.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Expertise in transforming business requirements into analytical models, designing algorithms building models, providing data mining and reporting solutions across massive volume of structured, unstructured data and semi-structured data.
- Knowledge on working with large transactional databases across multiple platforms such as Teradata, SAS, HDFS and Oracle.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
- Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages.
- Strong understanding of how analytics supports a large organization including being able to successfully articulate the linkage between business objectives, analytical approaches &findings and business decisions.
Software Methodologies: Scrum and Agile methodologies, Waterfall, Test Driven Development
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.
Programming Language: C, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts
Tools: & IDE: Eclipse, Netbeans, IntelliJ, R Studio, Tableau, Microsoft Office Suite, ETL tools, IntelliJ, AWS- Kinesis, Redshift, lambda, SonarQube, JIRA
Web Technologies: HTML, DHTML, Xml, Ajax, WSDL.
No SQL Databases: HBase,Cassandra, MongoDB
Databases: MYSQL, MS Access, SQL, HBase, Oracle
Version Control: SVN, CVS, GitHub
Web Services: REST, SOAP
Operating Systems: Mac, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Big Data Engineer
Environment: Hadoop, Cloudera, Talend, Python, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, JIRA, Informatica, Windows & Microsoft Office, Tableau.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked on batch processing of data sources using Apache Spark, Elastic search.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs, Scala.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from different source(database & files) into Hive using Talendtool.
- Conducted POC’s for ingesting data using Flume.
- Used all major ETL transformations to load the tables through Informaticamappings.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioningfor Hiveperformance enhancement and storage improvement.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableauadhoc reports.
- Provide support for research and resolution of testing issues.
- Coordinating with Business for UAT sign off.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Oozie, Nagios, Ganglia, LINUX, Hue
- Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Implemented test scripts to support test driven development and continuous integration
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Assisted in monitoring Hadoop cluster using tools like Nagios, and Ganglia
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Environment: Excel, SSIS, Oracle, MS SharePoint, Erwin, Python
- Worked as a Data Analyst to generate data models using Erwin data modeler and developed rational database system.
- Extracted data from various data sources applied functions, complex formulas and conditions to clean raw data, applied V-lookups and Index-match functions to merge and filter information from various data sets for reporting and statistical gathering purpose.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Involved with Data Analysis primarily identifying data sets, source data, source meta data, data definition and data formats.
- Implemented metadata repository, maintaining data quality, data cleanup procedures, transformations, data standards, data governance program, scripts, stored procedures, triggers and execution of test plans.
- Worked along with ETL, BI and DBA teams to analyze and provide solutions to data issues and other challenges during the implementation of the OLAP model.
- Performed end to end Informatica ETL testing for custom tables by writing complex SQL queries on the source database and compared them with the results against the target database.
- Converted raw data into processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
- Created and designed reports that will use gathered metrics to infer and draw conclusions of past and the future behavior.
- Documented the complete process flow to describe program development, logic, testing and implementation application integration and coding.
Environment: SQL, Excel, SSIS, MS-SQL Databases, MS SharePoint 2010
- Reviewed, evaluated, designed, implemented and maintained reporting to support current business initiatives.
- Successfully completed training on in house databases, extracted data from various data sources, applied Excel functions to transform raw data in business information required for reporting and data analysis purposes.
- Created SQL code to identify, analyze and interpret trends in large datasets and also retrieve data from SQL to clean up data systems.
- Created complex formulas and conditions to clean raw data, applied V-lookups and Index-match functions to merge and filter information from various data sets for reporting and statistical gathering purposes.
- Converted raw data to processed data by merging, finding outliners, errors, trends, missing values and distributions in the data.
- Worked closely with the ETL, SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Lead data discovery, handling structured and unstructured data, cleaning and performing descriptive analyses, and storing as normalized tables for dashboard.