Sr. Big Data Engineer Resume
Fremont, CA
SUMMARY
- Over 7+ years of experience in designing and developing end - to-endbigdatasolutions using Spark, Hadoop, HBASE, Hive, SQOOP and AWS services.
- Extensively worked on designing and developed severaldatapipelines usingPySpark and Hadoop Map Reduce.
- Extensive experience in SAS and SQL Coding and Programming,datamodeling anddatamining.
- Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure,DataFactory,DataBricks, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, AzureDataLake.
- Experienced in Database using Oracle, XML, DB2, Teradata, Netezza, SQL server,BigDataand NoSQL.
- Experience working with Python Scripts, Hive and Pig Queries
- Hands-on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig,Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper,Kafka, Flume, MapReduce framework, Yarn, Scala, and Hue.
- Experienced on Hadoop Ecosystem andBigDatacomponents including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
- Experience in building and architecting multipleDatapipelines, including ETL and ELT process forDataingestion and transformation in GCP using Big Query,Dataproc, Cloud SQL, Datastore.
- Experience in Google Cloud (GCP) components, Google container builders and GCP client libraries and cloud SDK’s.
- Extensively used SparkDataFrames API over Cloudera platform to perform analytics on Hivedataand used SparkDataFrame Operations to perform requiredValidations in thedata.
- Experience inBigData/Hadoop,DataAnalysis,DataModeling professional with applied information Technology.
- Hands-on experience with Amazon Web Services EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic LoadBalancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family.
- Experienced in business requirements collection methods using Agile, Scrum and Waterfall methods and software development life cycle (SDLC) testing methodologies, disciplines, tasks, resources and scheduling.
- Extensive experience in working with NoSQL databases and its integration DynamoDB, Cosmo DB, Mongo DB, Cassandra, and HBase.
- Good Experience with Python web frameworks such as Django, Flask, and PyramidFramework.
- Experience in importing and exporting thedatausing Sqoop from HDFS toRelational Database Systems and from Relational Database Systems to HDFS.
- Experience in working with GIT, Bitbucket Version Control System.
- Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
- Strong experience in Unix and Shell Scripting. Experience on Source control repositories like SVN, CVS and GITHUB.
- Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
- Good Experience on SDLC (Software Development Life cycle) and Agile methodology.
PROFESSIONAL EXPERIENCE
Sr. Big Data Engineer
Confidential, Fremont, CA
Responsibilities:
- Responsible for working with various teams on a project to develop analytics-based solution to target customer subscribers specifically.
- Architect & implement medium to large scale BI solutions on Azure using AzureDataPlatform services (AzureDataLake,DataFactory,DataLake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL).
- Migration of on-premisedata(Oracle/ SQL Server/ DB2/ MongoDB) to AzureDataLake and Stored (ADLS) using AzureDataFactory (ADF V1/V2).
- Responsible for wide-rangingdataingestion using Sqoop and HDFS commands. Accumulate 'partitioned'datain various storage formats like text, Json, Parquet, etc.
- Working on tickets opened by users regarding various incidents, requests
- Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.
- Transforming business problems intoBigDatasolutions and defineBigDatastrategy and Roadmap. Installing, configuring, and maintainingDataPipelines
- Developed the features, scenarios, step definitions for BDD (Behavior Driven Development) and TDD (Test Driven Development) using Cucumber, Gherkin and ruby.
- Designing the business requirement collection approach based on the project scope and SDLC methodology.
- Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and loaddatafrom different sources like Azure SQL, Blob storage, Azure SQLDatawarehouse, write-back tool and backwards.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Involved in all the steps and scope of the project referencedataapproach to MDM, have created aDataDictionary and Mapping from Sources to the Target in MDMDataModel.
- Responsible for designing and developingdataingestion from Kroger using Apache NiFi/Kafka.
- Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer
- Built real time pipeline for streamingdatausing Kafka and Spark Streaming.
- Designed both 3NFdatamodels for OLTP systems and dimensionaldatamodels using star and snowflake Schemas.
- Created and maintained SQL Server scheduled jobs, executing stored procedures for the purpose of extractingdatafrom Oracle into SQL Server. Extensively used Tableau for customer Marketing data visualization.
- Optimize algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
- Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for betterdatamessaging and to migrate clean and consistentdata
- Used Apache SparkDataframes, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- DataIntegration ingests, transforms, and integrates structureddataand deliversdatato a scalabledatawarehouse platform using traditional ETL (Extract, Transform, Load) tools and methodologies to collect ofdatafrom various sources into a singledatawarehouse.
- Designed and developed architecture fordataservices ecosystem spanning Relational, NoSQL, andBigDatatechnologies.
- Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loadingdatainto target system from multiple sources
- Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.
- Configured S3 event to trigger the Lambda function which automatically converts the file format from CSV to JSON and load into DynamoDB.
- Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies.
Environment: Spark-Streaming, Hive, Scala, Hadoop, Kafka, Spark, Sqoop, Docker, Spark SQL, TDD, pig, NoSQL, Impala, Oozie, HBase,DataLake, Zookeeper, Azure, Unix/Linux Shell Scripting, Python, PyCharm, Informatica, Informatica Power Center Linux, Shell Scripting.
Big Data Engineer
Confidential, Coppell, TX
Responsibilities:
- Developed Spark applications using Pyspark and Spark-SQL fordataextraction, transformation and aggregation from multiple file formats.
- Used SSIS to build automated multi-dimensional cubes.
- Used Spark Streaming to receive real timedatafrom the Kafka and store the streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra.
- Develop Spark streaming application to read raw packetdatafrom Kafka topics, format it to JSON and push back to kafka for future use cases purpose.
- Collecteddatausing Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learnerdatamodel and persists thedatain HDFS.
- Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, merges, aggregations, stacking,datalabeling and for all Cleaning and conforming tasks.
- Designing the business requirement collection approach based on the project scope and SDLC methodology.
- Createdatapipeline of gathering, cleaning and optimizingdatausing Hive, Spark.
- Gathering thedatastored in AWS S3 from various third-party vendors, optimizing it and joining with internal datasets to gather meaningful information.
- Combining various datasets in HIVE to generate Business reports.
- Transforming business problems intoBigDatasolutions and defineBigDatastrategy and Roadmap.
- Prepared and uploaded SSRS reports. Manages database and SSRS permissions.
- Used Apache NiFi to copydatafrom local file system to HDP. Thorough understanding of various modules of AML including Watch List Filtering, Suspicious Activity Monitoring, CTR, CDD, and EDD.
- Used SQL Server Management Tool to check thedatain the database as compared to the requirement give.
- Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on thedatain HDFS.
- Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python.
- Connected to AWS Redshift through Tableau to extract livedatafor real time analysis.
- Design and implement multiple ETL solutions with variousdatasources by extensive SQL Scripting, ETL tools, Python, Shell Scripting and scheduling tools.Dataprofiling anddatawrangling of XML, Web feeds and file handling using python, Unix and Sql.
- Loadingdatafrom different sources to adataware house to perform somedataaggregations for business Intelligence using python.
- Designed and implemented Sqoop for the incremental job to readdatafrom DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.
- Used Sqoop to channeldatafrom different sources of HDFS and RDBMS.
- Conduct root cause analysis and resolve production problems anddataissues
- Performance tuning, code promotion and testing of application changes.
- Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.
- Develop adataplatform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
- Worked on Dimensional and RelationalDataModeling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physicaldatamodeling using Erwin.
- Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.
- Develop Nifi workflow to pick up thedatafrom rest API server, fromdatalake as well as from SFTP server and send that to Kafka broker.
Environment: HDFS, NiFi, Pig, Hive, Cloudera Manager (CDH5), Hadoop, PySpark, S3, Kafka, Scrum, Git, Sqoop, Oozie, Informatica, Tableau, OLTP, OLAP, HBase, Python, Shell Scripting XML, Unix. Cassandra, Informatica, SQL Server.
UI Developer
Confidential, Indianapolis, IN
Responsibilities:
- Design, develop, and testing the web application by using HTML, CSS,Bootstrap, React.js and Redux
- Wrote Middleware, Redux-Promise in application to retrieve data from Back-End and to also perform RESTFUL services.
- Created and used Reducers that received said Actions to modify the Store StateTree.
- Used React JS for templating for faster compilation and developing reusable components.
- Working in React.js for creating interactive UI's using One-way data flow,Virtual DOM, JSX, React Native concepts.
- Maintained states in the stores and dispatched the actions using redux.
- Added dynamic Functionality by creating and dispatching Action Creators that deployed Actions.
- Designed and created Custom Reusable React Components Library.
- Worked with React-autocomplete for designing google map’s location search on the webpage.
- Used Restful web services to call for POST, PUT, DELETE and GET methods.
- Used Git for version controlling and regularly pushed the code to GitHub
- Developed dynamic and multi-browser compatible pages using HTML, CSS and JavaScript
- Developed single page applications (SPA) using ReactJS.
- Worked with global state with React-Redux
- Worked with HTTP services, sending and receiving JSON data through RESTful API
- Tested the Restful web services calls for API using POST, PUT, DELETE, and GET methods.
- Used SVN and GIT as the version control tool for maintaining version control of the application
Environment: HTTP, CSS, GitHub, Restful API, React, SVN, Javascript, HTML, Hive, SQL, HDFS, Hadoop, Sqoop, Oozie.