We provide IT Staff Augmentation Services!

Data Engineer & Analyst Resume

0/5 (Submit Your Rating)

CA

SUMMARY

  • Around 5 years of IT experience in a variety of industries working on BigDatatechnology using technologies such as Cloudera and Hortonworks distributions.
  • Hadoop working environment includes Hadoop, Spark, Scala, MapReduce, Kafka, Airflow, Hive, Ambari, Sqoop, HBase, Scala and Impala.
  • Hands - on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
  • Knowledge on relational and multidimensional data design and development techniques including star schema, snowflake schema, cube design, ETL and others.
  • Demonstrated expertise in Snowflake data modelling and ELT using Snowflake SQL, implementing complex stored procedures and best practices with data warehouse and ETL concepts.
  • Designed and implement effective analytics solutions and models with Snowflake.
  • Worked on Continuous Platform Maturity assessment and Improvement of the Snowflake Environment /User Provisioning.
  • Expertise in Snowflake advanced concepts like setting up resource monitors RBAC controls virtual warehouse sizing.
  • Established connectivity to Snowflake with third party tools/ on-prem other cloud platforms
  • Experience in developing distributed applications using Scala as a programming language.
  • Writing and developing the project feature using Scala language with the related framework.
  • Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of Cloud apps like AWS, GCP & Azure.
  • SAML and OpenID for authentication/Authorization.
  • Designing, creating, and maintaining Scala-based applications.
  • Exchanging authentication and authorization information in a variety of kinds of distributed transaction.
  • Provided basic support for emerging applications, such as SOAP-enabled e-commerce.
  • Developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.
  • Created data pipelines from AWS S3 to Snowflake and processed structured data into EDW.
  • Capable of processing large sets (Gigabytes) of structured, semi-structured or unstructured data.
  • Experience in analysing data using HiveQL, Pig, HBase and custom MapReduce programs in Java 8.
  • Experience working with GitHub/Git 2.12 source and version control systems.
  • Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.
  • Design, build and maintain the data ingestion pipelines using Scala FS2/Aka Streams/Kafka Streams.
  • Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
  • Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources.
  • Experience in performance tuning and optimizing code running in Databricks environment.
  • Experience in designing and deploying data applications on cloud solutions, such as Azure or AWS
  • Fluent programming experience with Scala, Java, Python, SQL, T-SQL, R.
  • Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node,DataNode and Hadoop MapReduce programming.

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive,Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala

No SQL Databases: HBase,Cassandra, Mongo DB

Languages: Java, Python, Scala, PySpark, UNIX shell scripts

Languages: Java, Python, Scala, PySpark, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Cloud: AWS, Azure

Operating Systems: Red Hat Linux, Ubuntu Linux, and Windows

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: SQL Server, MySQL, Snowflake, Teradata, HANA

IDE: Eclipse, IntelliJ

PROFESSIONAL EXPERIENCE

DATA ENGINEER & Analyst

Confidential, CA

Responsibilities:

  • Work with Product Owners/ Business Analyst in Agile environment to identify, develop business requirements and translate them into technical requirements and responsible for deliverables.
  • Work in Agile process to deliver the tasks in 3-week sprint model.
  • Design and build the re-usable Ingestion framework using Big Query and Python scripting to load Enterprise Data Lake (EDL) tables.
  • Develop data loads from Storage files to BigQuery Landing tables using Federated SQL’s.
  • Implement data Governance process using Authorized views and develop Policy tags to encrypt the PII/PCI columns upon loading to GCP BigQuery.
  • Automated the GCS bucket storages from Standard storage to Cold/Archive storages for effective storage cost savings.
  • Implement best practices using IAM roles for Cloud Storage and BigQuery using Service accounts and fine-grained custom roles.
  • Involved in various phases of development analysed and developed the system going through Agile Scrum Gathered Business Requirements by interacting with the Users, Project Managers, and SMEs to get a better understanding of the Business Processes.
  • Designed and modified Materialized Views with a preferred granularity of thedata, like performance improvement, dashboard structure, view orientation, sizing and layout,dataemphasis, highlighting, and color fordatamining visualizations.
  • Extensively used Visio to create Use Cases Diagrams, Activity Diagrams, and Sequence Diagrams to conceptually, model the sequence of activities involved in ETL processes.
  • Coordinated in developing optimized SQL Server stored procedures and database views. patterns between customer trends.
  • Connect Tableau and Squirrel SQL clients to Spark-SQL (Spark thrift server) via data source and
  • Run the queries.
  • Used Python Pandas library to clean, manipulate and transform data.
  • Used data mining techniques for outlier detection and created algorithm to connect patterns between customer trends.
  • Experienced in Python to manipulate data for data loading and extraction and worked with Python libraries like Matplotlib, NumPy, Scipy, Scikit-learn, Stats Models and Pandas for data analysis.
  • Developed and implemented metadata models for reporting functionalities and developed automated process for data corrections.
  • Scheduled, deployed and managed container replicas onto a node cluster using Kubernetes.
  • Wrote Junit tests and Integration test cases for those Microservice.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL data warehouse environment

Environment: Hadoop, Azure, Microservices, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC, Hive, JSON, Pig, Oozie, Mongo DB, Sqoop, Zookeeper, Flume, Impala, SQL, Scala, Python, Unix, GitHub.

APPLICATION ENGINEER

Confidential

Responsibilities:

  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization.
  • These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Designed APIs, developed shippable code, documentation, and unit test new features for digital products.
  • Worked with fellow API Developers, Team Leads, Architects to deliver features through the creation of re-usable RESTful APIs.
  • Front end design, development and integration.
  • Collaborated with Quality, Product and Cloud Engineering teams to keep digital assets fully functional, secure, and to up to date with business needs
  • AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.
  • Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Enhance and updated data models
  • Convert existing SQL objects to Snowflake
  • Performing the forking action whenever there is a scope of parallel process for optimization of data latency.
  • Implemented reporting in PySpark, Zeppelin, Jupiter, & querying using Airpal, Presto & AWS Athena.
  • Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
  • Develop framework to leverage PySpark solution to build Data pipeline for wholesale - capital markets trade related data.
  • Building data pipelines to consume and ingest application data into various layers of data lake.
  • Developing Automated Quality checks with UNIX shell scripts and reusable Source prep and Loading jobs to verify and reconcile data during data loads.

Environment: MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, AWS, Kafka, JSON, XML PL/SQL, SQL, HDFS, Unix, Python, PySpark

SPARK DEVELOPER

Confidential

Responsibilities:

  • Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output.
  • After executing the program and achieving an acceptable validation accuracy a submission was created that is stored in the submission directory.
  • Executed multiple Spark SQL queries after forming the Database to gather specific data corresponding to an image.
  • Snowflake solution creates a Snowflake Service Catalog Portfolio, a ‘SnowflakeEnduserGroup’ AWS IAM group
  • Provisions Secrets Manager to store and retrieve Snowflake connection information
  • Validate that a new Snowflake integration object has been created
  • Designed stream processing job used by Spark Streaming which is coded in Scala.
  • Ingested information from several sources like Kafka, Flume, and TCP sockets.
  • Provisions S3 bucket, a role for Snowflake, and the policies required to give Snowflake access to the bucket.
  • Processed data using advanced algorithms expressed with high-level functions like map, reduce, join and window.
  • Worked on services of Snowflake such asInfrastructure Management, Data Optimization, Metadata Management, and Security.
  • Translate legacy code to SnowSQL, including SQL, Stored Procs, ETLs, and other complex code types.
  • User or service account in Snowflake with sufficient privileges to provision
  • Read train and test data into the data directory as well as into Spark variables for easy access and proceeded to train the data based on a sample submission.
  • The images upon being displayed are represented as NumPy arrays, for easier data manipulation all the images are stored as NumPy arrays.
  • Created a validation set using Keras2DML in order to test whether the trained model was working as intended or not.
  • Defined multiple helper functions that are used while running the neural network in session. Also defined placeholders and number of neurons in each layer.
  • Fed inbound events into the Scala-project-inbound topic in order to check if the window summary event functions as intended or not.

Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, Tensor Flow, NumPy, Keras, PowerBI

We'd love your feedback!