We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Columbia, SC

SUMMARY

  • Overall 7+ years of professional experience in IT and around 5 years of expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Documentation, Deployment and Integration using SQL and Big Data technologies.
  • Experience in implementing various Big Data Analytical, Cloud Data engineering, Data Warehouse/ Data Mart, Data Visualization, Reporting, Data Quality, and Data virtualization solutions.
  • Have proven track record of working as Data Engineer on Amazon cloud services, Bigdata/Hadoop applications and product development.
  • Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools, AWS.
  • Well versed with big data on AWS cloud services i.e EC2, S3, Glue, Athena, DynamoDB and RedShift.
  • Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
  • Provisioned the highly available EC2 Instances using Terraform and cloud formation and wrote new plugins to support new functionality in Terraform.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Experience working on creating and running Docker images with multiple micro - services.
  • Good experience in deploying, managing, and developing with MongoDB clusters.
  • Docker container orchestration using ECS, ALB and lambda.
  • Experience with Unix/Linux systems with scripting experience and building data pipelines.
  • Responsible for migration of application running on premise onto Azure cloud.
  • Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
  • Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS)
  • Excellent Programming skills at a higher level of abstraction using Scala and Java
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
  • Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, NagiOS, and Ambari metrics.
  • Extensively worked onSparkusingScalaon cluster for computational (analytics), installed it on top ofHadoopperformed advanced analytical application by making use ofSparkwithHiveandSQL/Oracle.
  • Expert in setting up Horton works cluster with and without using Ambari.
  • Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Horton works Ambari.
  • Played a key role in migrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies.
  • Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
  • Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/ data marts from heterogeneous sources.
  • Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala, Solr, Git, Maven, AVRO, JSON and CHEF.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Good understanding of software development methodologies, including Agile (Scrum).
  • Expertise in development of various reports, dashboards using various Tableau visualizations.
  • Hands on experience with different programming languages such as Java, Python, R, SAS.
  • Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, HBase, Kafka, and Crontab tools.
  • Expert in creating HIVE UDF’s using java to analyze data sets for complex aggregate requirements.
  • Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-SQL, and Pig.
  • Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
  • Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, KafkaFlume, Cassandra, Impala, Oozie, Zookeeper, MapReduce, Amazon Web Services (AWS), EMR

Machine Learning Classification Algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)

IDE’s: IntelliJ, Eclipse, Spyder, Jupyter

Ensemble and Stacking: Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, AutoML - Scikit-Learn, MLjar, etc.

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE

Programming / Query Languages: Java, SQL, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SAS, R Programming (Caret, glmnet, XGBoost, rpart, ggplot2, sqldf, RStudio, PL/SQL, Linux shell scripts, Scala.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, SpringBoot, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, NI-FI, GCP, Google Shell, Linux, Big Query, Bash Shell, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design.

PROFESSIONAL EXPERIENCE

Confidential, Columbia, SC

Senior Big Data Engineer

Responsibilities:

  • Analyzing large amounts of data sets to determine optimal way to aggregate and report on these data sets.
  • Designed and Implemented Big Data Analytics architecture, transferring data from Oracle.
  • Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
  • Design, Implement and maintain Database Schema, Entity relationship diagrams, Data modeling, Tables, Stored procedures, Functions and Triggers, Constraints, clustered and non-clustered indexes, partitioning tables, Schemas, Functions, Views, Rules, Defaults, and complex SQL statement for business requirements and enhancing performance.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and for generation of reports for the BI team.
  • Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Worked on designing, building, deploying, and maintaining Mongo DB.
  • Design SSIS packages to bring data from existing OLTP databases to new data warehouse using various transformations and tasks like Sequence Containers, Script, for loop and Foreach Loop Container, Execute SQL/Package, Send Mail, File System, Conditional Split, Data Conversion, Derived Column, Lookup, Merge Join, Union All, OLE DB source and destination, excel source and destination with multiple data flow tasks.
  • Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data.
  • Coordinated with team and Developed framework to generate Daily ad hoc, Report's and Extracts from enterprise data and automated using Oozie.
  • Improve the performance of SSIS packages by implementing parallel execution, removing unnecessary sorting, and using optimized queries and stored procedures.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR Spark cluster and Cloud Dataflow on GCP.
  • Configure and manage data sources, data source views, cubes, dimensions, mining structures, roles, defined hierarchy and usage-based aggregations with SSAS.
  • Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR and MapR (MapR data platform).
  • Responsible for maintaining and tuning existing cubes using SSAS and Power BI.
  • Worked on cloud deployments using maven, docker and Jenkins.
  • Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch.
  • Used AWS Glue for the data transformation, validate and data cleansing.
  • Used python Boto 3 to configure the services AWS glue, EC2, S3.
  • Architect and design serverless application CI/CD by using AWS Serverless (Lambda) application model.
  • Understanding of cloud technology.
  • Participates in the development improvement and maintenance of snowflake database applications.
  • On demand, secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.
  • Managed AWS infrastructure as code using Terraform.
  • Extensively involved in infrastructure as code, execution plans, resource graph and change automation using Terraform.
  • Build the Logical and Physical data model for snowflake as per the changes required.
  • Evaluate Snowflake Design considerations for any change in the application.
  • Developedstored procedures/views in Snowflakeand use inTalendfor loading Dimensions and Facts.
  • Developed merge scripts toUPSERTdata intoSnowflakefrom an ETL source.

Environment: Hadoop, Map Reduce, HDFS, Hive, Ni-fi, Spring Boot, Cassandra, Swamp, Data Lake, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential - Boise, Idaho

Senior Big Data Developer

Responsibilities:

  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Informatica BDM, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Implemented ETL process through Informatica BDM and python scripting to load data from Denodo (visualization layer) to ThoughtSpot that help business to run advance analytics algorithms.
  • Applying Hadoop map reduce performance tuning techniques and build hive queries efficiently.
  • Designing the distribution strategy for tables in Azure SQL data warehouse
  • Build CICD Pipeline for code deployment to higher environment.
  • Technical Validation of functional specification and data mapping from source model into target model.
  • Collaborate with business stakeholders to identify and meet data requirements.
  • Involved in the Workshop and High-Level Design of Claims Monthly Management Metrics (C3M) project under DEEP that enables claim historical reporting.
  • Supported business development in Claims Area by doing impact analysis, effort estimation and future state solution development.
  • Developed thefeatures,scenarios,step definitionsforBDD (Behavior Driven Development)andTDD (Test Driven Development)usingCucumber, Gherkinandruby.
  • Designing the business requirement collection approach based on the project scope and SDLC methodology.
  • Experienced in creating data pipeline integratingkafkawithspark streamingapplication usedscalafor writing applications.
  • UsedsparkSQLfor reading data from external sources and processes the data usingScalacomputation framework.
  • Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Files extracted from Hadoop and dropped on daily hourly basis intoS3. Working with Data governance and Data quality to design various models and processes.
  • Experience in deploying the Spring Boot Microservices to Pivotal Cloud Foundry (PCF) using build pack and Jenkins for continuous integration, Deployments in Pivotal Cloud Foundry (PCF) and binding of Services in Cloud and Installed Pivotal Cloud Foundry (PCF) on Azure to manage the containers created by PCF.
  • Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies on petabytes of data by using Azure Databricks, Hive, Hadoop, Python, PySpark, Spark SQL, MapReduce, and Azure Machine Learning.
  • Analyzed clickstream data from Google analytics with Big Query. Designed APIs to load data from Omniture, Google Analytics, Google Big Query.
  • Maintained JIRA team and program management review dashboards and maintained COP account and JIRA team sprint metrics reportable to customer and SAIC division management.

Environment: Hadoop, Kafka, Spark, Sqoop, Docker, Swamp, Big Query, Spark SQL, TDD, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, HBase, Data Lake, Zookeeper.

Confidential - Newark, DE

Senior Data Engineer/Scala Developer

Responsibilities:

  • Responsible for architectingHadoopclusters Translation of functional and technical requirements into detailed architecture and design.
  • Worked on analyzingHadoopcluster and different big data analytical and processing tools includingPig, Hive, Spark, and Spark Streaming.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Migrating various Hive UDF's and queries intoSparkSQLfor faster requests.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data toHDFSusingScala.
  • Hands on experience inSparkand Spark Streaming creatingRDD's, applying operations -Transformation and Actions.
  • Developed multiple POCs using Scala and deployed on the YARN cluster, compared the performance of spark with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Experienced Scheduling jobs usingControl-M.
  • Knowledgeable in Spark and Scala Framework exploration for transition from Hadoop /Map Reduce to Spark.
  • Developed analytical component using Scala, Spark, and Spark stream.
  • Developed and implementedhivecustomUDFsinvolving date functions.
  • UsedSqoopto import data from Oracle toHadoop.
  • UsedOozieworkflow engine to manage interdependent Hadoop jobs and to automate several types ofHadoopjobs such asJava map-reduce Hive, Pig,andSqoop.
  • Experienced in developing scripts for doing transformations usingScala.
  • Involved in developingShell scriptsto orchestrate execution of all other scripts and move the data files within and outside ofHDFS.
  • Installed and configuredHive,Pig,SqoopandOozieon theHadoopcluster.
  • UsingKafkaon publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • UsedTableaufor generating reports on weekly basis to the customer.
  • AnalyzingHadoopcluster and different Big Data analytic tools includingPig,Hive,HBaseandSqoop.
  • Implemented applications with Scala along with Akka and Play framework.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Auction web app - calculated bids for energy auctions utilizing Scala, JPA and Oracle.
  • Developed a Restful API using & Scala for tracking open-source projects in Github and computing the in-process metrics information for those projects.
  • Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
  • ImplementedKerberosSecurity Authentication protocol for existing cluster.
  • Performed reverse engineering using Erwin to redefine entities, attributes, and relationships existing database.
  • Analyzed functional and non-functional business requirements and translate into technical data requirements and create or update existing logical and physical data models. Developed a data pipeline using Kafka to store data into HDFS.
  • Performed Regression testing for Golden Test Cases from State (end to end test cases) and automated the process usingpythonscripts.
  • DevelopedSparkjobs using Scala for faster real-time analytics and usedSparkSQL for querying
  • Generated graphs and reports using ggplot package in RStudio for analytical models. Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest, and Naïve Bayes.
  • Used pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms. Expertise inR, MATLAB, pythonand respective libraries.
  • Research on Reinforcement Learning and control (TensorFlow, Torch), andmachinelearning model (Scikit-learn).
  • Hands on experience in implementing Naive Bayes and skilled inRandom Forests, Decision Trees, Linear,and Logistic Regression, SVM, Clustering, Principal Component Analysis.
  • Performed K-means clustering, Regression andDecision Treesin R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
  • Implemented various statistical techniques to manipulatethe datalike missingdataimputation, principal component analysis and sampling.
  • Worked on R packages to interface with Caffe Deep Learning Framework. Perform validation on machine learning output from R.
  • Applied different dimensionality reduction techniques like principal component analysis (PCA) and t-stochastic neighborhood embedding(t-SNE) on feature matrix.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.
  • Worked with Market Mix Modeling to strategize the advertisement investments to better balance the ROI on advertisements.
  • Implemented clustering techniques like DBSCAN, K-means, K-means++ and Hierarchical clustering for customer profiling to design insurance plans according to their behavior pattern.
  • Used Grid Search to evaluate the best hyper-parameters for my model and K-fold cross validation technique to train my model for best results.
  • Worked with Customer Churn Models including Random Forest regression, lasso regression along with pre-processing of the data.
  • Used Python 3.X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Implemented Univariate, Bivariate, and Multivariate Analysis on the cleaned data for getting actionable insights on the 500-product sales data by using visualization techniques in Matplotlib, Seaborn, Bokeh, and created reports in Power BI.

Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, TDD, Python, Spring Boot, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL.

Confidential - Scottsdale, AZ

Data Engineer/Python Developer

Responsibilities:

  • Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka.
  • Build the oozie pipeline which performs several actions like file move process, Sqoop the data from the source Teradata or SQL and exports into the hive staging tables and performing aggregations as per business requirements and loading into the main tables.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbedElastic Map Reduce (EMR)on(EC2).
  • Performing the forking action whenever there is a scope of parallel process for optimization of data latency.
  • Developed spark code using python for faster processing of data on Hive.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Designed forms, modules, views and templates using Django and Python.
  • Loading, analyzing and extracting data to and from Elastic Search with python.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Performed pig script which picks the data from one hdfs path and performs aggregation and loads into another path which later pulls populates into another domain table. Converted this script into a jar and passed as parameter in oozie script.
  • Worked on Spark using python and Spark SQL for faster testing and processing of data.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mlib.
  • Hands on experiences on git bash commands like git pull to pull the code from source and developing it as per the requirements, git add to add files, git commit after the code build and git push to the pre prod environment for the code review and later used screwdriver.yaml which actually build the code, generates artifacts which releases in to production.
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin. Involved in transforming data from legacy tables toHDFS, andHBasetables usingSqoop.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Extensive experience as Oracle PLSQL Developer in utilizing PL/SQL procedures, functions, packages, triggers, shell scripting, unit testing and involved indata extraction, transformation and loadingoperations on oracle using SQL Loader.
  • Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
  • Used thetrace facilityto check the PLSQL code at Server side for issues and better performance.
  • DevelopedPythonscript to run SQL query as parallel to initial load data into target table. Involved in loading data from edge node to HDFS using shell scripting and assisted in designing the overall ETL strategy.
  • Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources.
  • Developed and implemented R and Shiny application which showcases machine learning for business forecasting. Developed predictive models using Python & R to predict customers churn and classification of customers.
  • Used thetrace facilityto check the PLSQL code at Server side for issues and better performance.
  • Worked with applications like R, SPSS, and Python to develop neural network algorithms, cluster analysis, ggplot2 and shiny in R to understand data and developing applications.
  • Partner with technical and non-technical resources across the business to leverage their support and integrate our efforts.
  • Coded Reports and programs to support Billing functions in SQL, Oracle Reports and PLSQL.
  • Responsible for writing database triggers to validate business logic using PLSQL.
  • Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors.
  • Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.

Environment: HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie, Mahout, HBase, Hortonworks data platform distribution, Cassandra, SQL, MSSQL Server, MS Office, MS Visio, Jupyter, R 3.1.2, Python, SAS, MongoDB, HBase.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in installing Hadoop Ecosystem components.
  • Develop and run Map-reduce jobs on a multi - Peta byte YARN and Hadoop clusters which processes billions of events every day, to generate daily and monthly reports as per user’s need.
  • Used to manage and review the Hadoop log files and responsible to manage data coming from different sources.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.
  • Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, MicroStrategy.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming and SQL queries.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Worked on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
  • Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
  • Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snowflake Schema, Fact Table and Dimension Table.
  • Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors. queried both Managed and External tables created by Hive using Impala.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Hadoop, Map Reduce, Spark, Spark MLLib, Tableau, SQL, Excel, VBA, SAS, Matlab, AWS, SPSS, Cassandra, Oracle, MongoDB, SQL Server 2012, DB2, T-SQL, PL/SQL, XML, Tableau.

Confidential 

Data Analyst

Responsibilities:

  • Gathered high level requirements and developed scope of the project for the implementation of Microsoft Office Share Point 2007.
  • Experience with Data Extraction, Transforming and Loading (ETL) using various tools such as Data Transformation Service (DTS), SSIS and Bulk Insert (BCP)
  • Responsible for creating test scenarios, scripting test cases using testing tool and defect management for Policy Management Systems, Payables/Receivables and Claims processing.
  • Worked on billing system a cash management module and enhanced the encrypting standards that are required for the application.
  • Worked on bug tracking reports on daily basis using Quality Center.
  • Designed, developed and tested data mart prototype (SQL 2005), ETL process (SSIS) and OLAP cube (SSAS)
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Use of data transformation tools such as DTS, SSIS, Informatica or Data Stage.
  • Create the data model, design the ETL process approach, identify the metrics, KPI's from the data and design the mockups to present the data on Tableau dashboard.
  • Focused continuous improvement efforts on driving gap closures identified through the implementation of rigorous metrics and Key Performance Indicators (KPI's).
  • Performed Unit Testing and User Acceptance testing and documented detailed results.

Environment: Oracle 10g/9i/8i/7.x, MS SQL Server, UDB DB2 9.x, Teradata, Quality Center, SQL Queries., KPI's, Siebel Analytics, SSIS Oracle BI, UML, OLAP, Data mining, Teradata SQL Assistant, DBMS.

We'd love your feedback!